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DP AND E2F PROTEIN NUCLEAR LOCALISATION SIGNALS AND THE TR 

USE 

The present invention relates to the use of the E region of 
5 the transcription factor DP- 3 as a target for novel assays 
and its use as a nuclear localisation signal. 

The orderly progress of cells through the cell cycle 
involves a number of control points which assess the status 
of the intracellular and extracellular environment. A major 
10 control point, occurring as cells enter S phase, involves 

the cellular transcription factor E2F , a molecule implicated 
in the regulation of S phase gene expression (Nevins, 1992; 
La Thangue, 1994; Miiller, 1995; Weinberg, 1995). An 
important for E2F in early cell cycle control is suggested 

15 by the nature of the proteins which influence its 

transcriptional activity. For example, members of the group 
of pocket proteins, exemplified by the retinoblastoma tumour 
suppressor gene product (pRb) , repress the transcriptional 
activity of E2F (Hiebert et al . , 1992; Zamanian and La 

20 Thangue, 1992; 1993; Schwarz et al . , 1993; Wolf et al., 
1995) . The ability to repress E2F correlates with the 
capacity of pRb, or its relatives, to negatively regulate 
early cell cycle progression (Hiebert et al . , 1992; Zamanian 
and La Thangue, 1992; Hinds et al . , 1992; Zhu et al . , 1993 ; 

25 1995a) . Furthermore, growth arrest caused by high level 

expression of pRb can be overcome by increasing the level of 
E2F (Zhu et al . , 1993), implying that E2F is a principal 
physiological target through which pRb exerts its effects on 
the cell cycle. Another group of molecules which regulate 

30 cell cycle transitions, the cyclins and their associated 
catalytic regulatory subunits, also interact with and 
control the activity of E2F (Bandara et al . , 1991; Lees et 
al., 1992;. Zhu et al . , 1995b). Cyclins A, E and D together 
with an appropriate catalytic subunit can influence the 

35 biological activity of pocket proteins (Hinds et al . , 1992; 
Dowdy et al . , 1993; Ewen et al . , 1993; Sherr, 1993), and 



BNSDOCID: <WO 9743647A1_L> 



WO 97/43647 



- 2 - 



direct phosphorylation by cyclinA-cdk2 is believed to 
interfere with the DNA binding activity of E2F (Krek et aJ. , 
1994; 1995) . 

The physiological regulation of E2F activity imparted by 
5 these afferent signalling proteins can be subverted by viral 
oncoproteins, such as adenovirus Ela, human papilloma virus 
E7 and SV40 large T antigen, through their ability to 
release active E2F by sequestering pocket proteins and 
cyclin/cdk complexes (Bandara and La Thangue, 1991; 
10 Chellappan et al . , 1991; 1992; Morris et al . , 1993). This 
property correlates with the ability of these viral 
oncoproteins to transform tissue culture cells, again 
implicating E2F as an important physiological target in 
virally-medicated oncogenesis . 

15 Considerable progress has been made in elucidating the 

composition of E2F. It is now known the E2F DNA binding 
activity defined in mammalian cell extracts is a generic 
activity caused by an array of DNA binding heterodimers made 
up from two distinct families of proteins, known as E2F and 

20 DP (La Thangue, 1994) . Five members of the E2F family, from 
E2F-1 to E2F-5, have been isolated, each protein possessing 
preferential specificity for pocket proteins (Helin et al . , 
1992; Kaelin et al . , 1992; Shan et al 1992; Ivey-Hoyle et 
al . , 1993; Lees et al . , 1993; Bei jersbergen et al . , 1994; 

25 Ginsberg et al . , 1994; Buck et al . , 1995; Hijmans et al . , 

1995; Sardet et al . , 1995). For example, E2F-1 is regulated 
by pRb, and E2F-4 by p!07 and pl30 (Helin et al., 1993a; 
Flemington et al . , 1993; Bei j ersbergen et al., 1994; 1995; 
Ginsberg et al . , 1994; Vairo et al . , 1995). Three members 

30 of the DP family are known (Girling et al. ,1993; 1994; 
Ormondroyd et al . , 1995; Wu et al . , 1995; Zhang and 
Chellappan, 1995) , DP-1 being a widespread and constitutive 
component of physiological E2F during cell cycle progression 
in some cell types (Girling et al . , 1993; Bandara et al . , 

35 1994) . Supporting their role as dominant regulators of the 
cell cycle, both E2F and DP proteins have been shown to 
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possess proto-oncogenic activity (Johnson et al . , 1994 ; 
Jooss et al . , 1995) . 

Our previous characterisation of DP-3 indicated that it is a 
novel member of the DP family of proteins and that its RNA 
5 undergoes extensive alternative splicing (Ormondroyd et al . , 
1995) . Processing events in the 5' untranslated region and 
coding sequence of the RNA give rise to a range of products 
present in. both cell lines and tissues (Ormondroyd et al . , 
1995) . A sequence Of 16 amino acid residues within the N- 
10 terminal region of the DNA binding domain, known as the E 
region, is one such region subject to the alternative 
splicing of DP-3 RNA . Further, in the four DP-3 protein 
products which have been characterised, <y and 6 constitute 
Eh- forms, whereas /? and y are E- variants (Ormondroyd et 
15 al . , 1995). Although E- ; extensive sequence conservation is 
apparent across the DP protein family, a comparison of the 
known DP protein sequences indicated that they fall into two 
categories, being either E+ or for example, DP-1 is an E- 
variant . 

20 Description of the Drawing. 



Figure 1 shows the DP-3 E-region exon and the patterns of 
alternate splicing which give rise to E+ and E- forms of DP- 
3 . 



Disclosure of the Invention. 

25 In the present study, we have defined a role for the E 
region by showing that its inclusion contributes to an 
alternatively spliced nuclear localization signal: 
specifically, E+ DP-3 proteins accumulate in the nuclei 
whereas E- proteins, including DP-1, fail to do so. Without 

30 the E region, DP proteins rely upon an alternative mechanism 
which involves an interaction with an appropriate E2F family 
member, for example E2F-1, for nuclear accumulation. These 
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data define two mechanisms of concrol in the nuclear 
accumulation of E2F transcription factor influenced by 
alternative splicing of a nuclear localization signal and 
subunit composition, and indicate a hitherto unexpected and 
5 novel level of control in regulating the levels of the 
nuclear E2F/DP heterodimer . 

The present invention thus provides an assay for a putative 
regulator of cell cycle progression which comprises: 

expressing in a cell a protein comprising (i) an E 
region and sufficient C- terminal residues thereof 
of a DP-3 protein to provide a functional nuclear 
localisation signal (NLS) and (ii) a marker for 
nuclear localization; and 

determining the degree of nuclear localization in 
the presence and absence of said putative 
regulator. 

In a further embodiment of the invention, the finding that 
DP proteins such as DP-1 lack an NLS indicate that the 
complex of such DP proteins with an E2F (such as E2F-1) are 
20 localised in the nucleus by the presence of an NLS on the 

E2F protein. The DP-3 NLS is not homologous to the E2F NLS. 
Thus the E2F NLS forms a further target for antagonists of 
nuclear localisation of the DP/E2F complex, particularly 
complexes such as DP-1/E2F-1 which do not comprise an E 
25 region. We have identified the nuclear localisation signal 
region in E2F-1. This region is identified as residues 85- 
91 of the human E2F-1 sequence shown as SEQ ID NO. 12 below. 
Thus the invention also provides an assay for a putative 
regulator of cell cycle progression which comprises: 
30 a. expressing in a cell a protein comprising (i) the 

nuclear localisation signal of E2F-1 and (ii) a 
marker for nuclear localization; and 
b. determining the degree of nuclear localization in 
the presence and absence of said putative 
3 5 regulator. 



10 



15 
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10 



The proteins defined in parts M a" above will be referred to 
as the "a protein comprising an NLS- region" and the like for 
the sake of brevity. 

In one embodiment, .the r E region comprises the sequence: 
SDRKRAREFIDSDFSE (SEQ ID NO. 9) 

However, this E region is derived from the murine DP-3 gene 
and other E regions may be used, for example the human E 
region or other mammalian E regions. The murine DP-3 alpha, 
beta, gamma and delta genes are shown as SEQ ID NOs . 1 and 
2, 3 and. 4, 5 and 6, and 7 and 8 respectively. Other DP-3 
genes may be obtained by routine cloning methods. For 
example, the human DP-3 gene may be cloned by probing a cDNA 
or genomic library with a nucleic acid probe derived from 
either a known human DP-gene (e.g. DP-l) and/or the murine 
15 DP-3 gene, and positive clones selected and sequenced for 
the human DP-3 gene. Similar techniques may be used for 
other mammalian DP-3 genes and will be readily apparent to 
* those of skill in the art. 

As described herein, - the E region requires a number of C- 
2 0 terminal residues- found in the DP-3 sequence in order to 
function as .an NLS. Desirably, from 6 to 50, e.g 8 to 30 
and preferably from 8 to 20 C- terminal residues are used. 

Similarly, the NLS of E2F-1 may be used with accompanying N- 
or C-terminal residues from the natural sequence of this 
25 protein, although these are not essential for the activity 
of the NLS. 

Although assays of the invention are preferably based upon 
naturally occurring NLS-regions sequences and associated C- 
terminal regions thereof sufficient to act as an NLS, these 
30 sequences may also be modified by substitution, deletion or 
insertion provided that the function of these sequences is 
substantially retained. The retention of function may be 
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tested for in accordance with the description and examples 
herein. Such modified and functional NLS- regions are 
included within the definition of the terms "an E region of 
a DP-3 protein" and u the nuclear localisation signal of E2F- 



For example, from 1 to 4 substitutions may be made and thes 
are preferably conservative substitutions. Examples of 
conservative substitutions include those set out in the 
following table, where amino acids on the same block in the 
10 second column and preferably in the same line in the third 
column may be substituted for each other: 



ALIPHATIC 


Non-polar 


GAP 


I L V 


Polar - uncharged 


C S T M 


N Q 


Polar - charged 


D E 


K R 


AROMATIC 




H F W Y 


OTHER 




N Q D E 



15 Where deletions or insertions are made, these are preferably 
limited in number for example from 1 to 3 of each. 

The cell in which the assay may be conducted is any suitable 
eukaryotic cell in which the NLS-regions function as nuclear, 
localisation signals. Suitable cell types include yeast, 
20 insect or mammalian cells, e.g. primate cells such as COS7 
cells. 

In the assay according to the invention the marker may be 
any polypeptide sequence which allows detection of the 
presence and location (i.e. cytoplasmic vs nuclear) of the 
25 protein comprising an NLS region. Suitable markers include 
an antigenic determinant bindable by an antibody, an enzyme 
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capable of causing a colour change to a substrate or a 
lucif erase enzyme. 

In a preferred embodiment, the marker comprises a 
transcription factor or., subunit thereof, which transcriptioi 
5 factor is capable of activating an indicator gene. This 
embodiment avoids the need for detailed examination of the 
cell to determine where the marker has located. In this 
embodiment the activation of transcription of the indicator 
gene will show that the NLS-regions have been located the 
10 protein in the nucleus. 

For example, in a preferred embodiment of the invention the 
protein may comprise a heterologous DNA binding domain such 
as that of the yeast transcription factor GAL 4 . The GAL 4 
transcription factor comprises two functional domains. 
15 These domains are the DNA binding domain (DBD) and the 

transcriptional activation domain (TAD) . By fusing an NLS- 
region to one of those domains and expressing the other 
domain in the cell, a functional GAL 4 transcription factor 
is restored only when two proteins enter the nucleus and 
,20 interact. Thus, interaction of the proteins may be measured 
by the use of an indicator gene linked to a GAL 4 DNA 
binding site which is capable of activating transcription of 
said reporter gene. This assay format is described by 
Fields and Song, 1989, Nature 34 0 : 245-246. Other 
25 transcriptional activator domains may be used in place of 
the GAL4 TAD, for example the viral VP16 activation domain 
(Fields and Jang, 1990). In general, fusion proteins 
comprising DNA binding domains and/or activation domains may 
be made . 

30 The indicator gene may comprise, for example, 

chloramphenicol acetyl transferase (CAT) or a lucif erase. 

The NLS may be located at the C-terminal or N-terminal of 
the marker gene. The NLS may be within all or part of the 
DP- 3 or E2F protein from which it originates, or may be 
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solely the NLS sequences identified above which provide the 
necessary NLS function. Thus fragments of DP-3 or an E2F 
(e.g. E2F-1) of from 15 to 400, eg from 20 to 100 or from 30 
to 50 amino acids comprising the NLS may be used. Where the 
5 NLS is fused to .the N- -or C-terminus of a marker gene, the 
fusion may comprise further sequences at its N- or C- 
terminus where this is desired or necessary. 

In any format, the assay may be used to screen peptides 
which regulate the function of an NLS. Regulation of the 

10 function includes antagonising the function to prevent 
nuclear localisation although regulators may also be 
agonists which enhance localisation. Regulation of the NLS 
may lead to effects such as enhanced cell division, blocking 
of cell cycle progression or apoptosis, the latter two being 

15 particularly preferred. Candidate regulators identified in 
accordance with the invention may be tested on cells with 
wild-type DP and E2F proteins to confirm the effect of 
regulating the NLS . 



20 



Such regulators will be useful either in themselves as 
potential regulators of cell proliferation or as models for 
rational drug design, e.g. by modelling the tertiary 
structure of the antagonist and devising chemical analogues 
which mimic the structure. 

Candidate regulators include peptides comprising all or part 
25 of a sequence which is from 60 to 100% homologous 

(identical) to a portion of an NLS region of the same 
length. Extracts of plants which contain several 
characterised or uncharacterised components may also be 
used. 



30 



Antibodies directed to the NLS regions form a further class 
of putative regulator compounds. Candidate regulator 
antibodies may be characterised and their binding regions 
determined to provide single chain antibodies and fragments 
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thereof which are responsible for regulating the 
interaction . 

Other candidate regulator compounds may be based on 
modelling the 3 -dimensional structure of the NLS regions and 
5 using rational drug design to provide potential inhibitor 
compounds with particular molecular shape, size and charge 
characteristics . 

An regulator substance identified using the present 
invention may be peptide or non-peptide in nature. Non- 
10 peptide "small molecules" are often preferred for many in 

vivo pharmaceutical uses. . Accordingly, a mimetic or mimick 
of the, substance (particularly if a peptide) may be designed 
for pharmaceutical use. 

The designing of mimetics. to a known pharmaceutically active 
compound is a known approach to the development of 
pharmaceuticals based on a "lead" compound. This might be 
desirable where the active compound is difficult or 
expensive to synthesise or where it is unsuitable for a 
particular method of administration, e.g. peptides are not 
well suited as active agents for oral compositions as they 
tend to be quickly degraded by proteases in the alimentary 
canal. Mimetic design, synthesis and testing may be used to 
avoid randomly screening large number of molecules for a 
target property. 

There are several steps commonly taken in the design of a 
mimetic from a compound having a given target property. 
Firstly, the particular parts of the compound that are 
critical and/or important in determining the target property 
are determined. In the case of a peptide, this can be done 
by systematically varying the amino acid residues in the 
peptide, e.g. by substituting each residue in turn. These 
parts or residues constituting the active region of the 
compound are known as its "pharmacophore". 



15 



20 



25 



30 
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Once the pharmacophore has been found, its structure is 
modelled to according its physical properties, e.g. 
stereochemistry, bonding, size and/or charge, using data 
from a range of sources, e.g. spectroscopic techniques, X- 
5 ray diffraction .data and NMR. Computational analysis, 

similarity mapping (which models the charge and/or volume of 
a pharmacophore, rather than the bonding between atoms) and 
other techniques can be used in this modelling process. 

In a variant of this approach, the three-dimensional 
10 structure of the ligand and its binding partner are 

modelled. This can be especially useful where the ligand 
and/or binding partner change conformation on binding, 
allowing the model to take account of this the design of the 
mimetic . 

15 A template molecule is then selected onto which chemical 
groups which mimic the pharmacophore can be grafted. The 
template molecule and the chemical groups grafted On to it 
can conveniently be selected so that the mimetic is easy to 
synthesise, is likely to be pharmacologically acceptable, 

2 0 and does not degrade in vivo, while retaining the biological 
activity of the lead compound. The mimetic or mimetics 
found by this approach can then be screened to see whether 
they have the target property, or to what extent they 
exhibit it. Further optimisation or modification can then 

2 5 be carried out to arrive at one or more final mimetics for 
in vivo or clinical testing. 

Antibodies may be obtained using techniques which are 
standard in the art. Methods of producing antibodies 
include immunising a mammal (e.g. mouse, rat, rabbit, horse, 

30 goat, sheep or monkey) with the protein or a fragment 

thereof. Antibodies may be obtained from immunised animals 
using any of a variety of techniques known in the art, and - 
screened, preferably using binding of antibody to antigen of 
interest. For instance, Western blotting techniques or 

35 immunoprecipitation may be used (Armitage et al . , 1992, 
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Nature 357: 80-82). Isolation of antibodies and/or 
antibody-producing cells from an animal may be accompanied 
by a step of sacrificing the animal. 

As an alternative, or supplement to immunising a mammal with 
5 a peptide, an antibody specific for a protein may be 

obtained from a recombinant ly produced library of expressed 
immunoglobulin variable domains, e.g. using lambda 
bacteriophage or filamentous bacteriophage which display 
functional immunoglobulin binding domains on their surfaces ,- 

10 for instance see WO92/01047. The library may be naive, that 
is constructed from sequences obtained from an organism 
which has not been immunised with any of the proteins (or 
fragments) , or may be one constructed using sequences 
obtained from an organism which has been exposed to the 

15 antigen of interest . 

Antibodies according to the present invention may be 
modified in a number of ways. Indeed the term "antibody" 
should be construed as covering any binding substance having 
a binding domain with the required specificity. Thus the 
20 invention covers antibody fragments, derivatives, functional 
equivalents and homologues of antibodies, including 
synthetic molecules and molecules whose shape mimicks that 
of an antibody enabling it to bind aa antigen or epitope. 

Examples of antibody fragments, capable of binding an 
25 antigen or other binding partner are the Fab fragment 
consisting of the VL, VH, CI and CHI domains; the Fd 
fragment consisting of the VH and CHI domains; the Fv 
fragment consisting of the VL and VH domains of a single arm 
of an antibody; the dAb fragment which consists of a VH 
domain; isolated CDR regions and F(ab')2 fragments, a 
bivalent fragment including two Fab fragments linked by a 
disulphide bridge at the hinge region. Single chain Fv 
fragments are also included. 



30 
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A hybridoma producing a monoclonal antibody according to the 
present invention may be subject to genetic mutation or 
other changes. It will further be understood by those 
skilled in the art that a monoclonal antibody can be 
5 subjected to the techniques of recombinant DNA technology to 
produce other antibodies or chimeric molecules which retain 
the specificity of the original antibody. Such techniques 
may involve introducing DNA encoding the immunoglobulin 
variable region, or the complementarity determining regions 
10 (CDRs) , of an antibody to the constant regions, or constant 
regions plus framework regions, of a different 
immunoglobulin. See, for instance, EP184187A, GB 2188638A 
or EP-A-0239400 . Cloning and expression of chimeric 
antibodies are described in EP-A-0120694 and EP-A- 0125023 . 

15 The amount of a putative regulator which may be screened in 
the assay of the invention desirably will be selected to be 
a concentration which is within 100 fold (above or below) 
the amount of an NLS- region-containing protein in the cell. 
By way of guidance this will mean that typically, from about 

20 0.01 to 100 nM concentrations of putative regulator compound 
may be used, for example from 0.1 to 10 nM. 

The assay of the invention may be conducted using transient 
expression vectors or stably transf ected cells. In either 
case, the protein comprising an NLS -region will be encoded 

25 by nucleic acid (preferably DNA) and said nucleic acid will 
be operably linked to a promoter which is functional in the 
host cell. The promoter and nucleic acid encoding the 
protein comprising an NLS-region will usually be part of a 
vector construct which may also contain signals for 

3 0 termination of transcription, a selectable marker and/or 

origins of replication functional in the host cell and/or in 
another cell type (e.g. E.coli) so that the vector may be 
manipulated and grown in the other cell type. 

Where an NLS-region sequence contains substitutions, 
35 deletions or insertions as described above the alterations 
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to the sequence may be made by manipulation of the nucleic 
acid sequence to alter the relevant codon(s) . This can be 
achieved by a number of well known standard techniques, e.g. 
site directed mutagenesis. 

5 Various vectors of this type are described in the Examples 
herein, and further vectors may be made by those of skill in 
the art in accordance with routine practice in molecular 
biology. 

In a separate embodiment, the invention also provides a 
10 method of directing expression of a protein in a cell to the 
nucleus which comprises modifying said protein such that is 
comprises an NLS-region and, in the case of a DP-3 derived 
NLS, sufficient C- terminal residues thereof of a DP-3 
protein to provide a functional nuclear localisation signal 
15 (NLS) . 

Such a method may be used to modify a DP-protein which does 
not normally comprise an E region so that the DP-protein 
(e.g. DP-1 or DP-2 does localise to the nucleus. This can 
be used to study the function of such DP proteins. These 
2 0 proteins are novel and thus form a further aspect of the 

invention. Desirably the NLS used to modify a DP-protein is 
a DP-3 derived NLS. 

E2F proteins, particularly E2F-4 and E2F-5 which lack an 
NLS, may also be modifed : by an NLS of the invention. 
25 Desirably the NLS used to modify an E2F-protein is an E2F-1- 
derived NLS . 

Modification of such proteins will usually be achieved 
through the use of recombinant DNA techniques, e.g. using 
nucleic acid encoding an NLS-region sequence and splicing 
30 it to or into nucleic acid encoding the protein of interest. 
The recombinant nucleic acid may be introduced into an 
expression vector in a manner analogous to that described 
above and the vector introduced into a suitable host cell, 
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e.g. a host cell in which a promoter operably linked to the 
recombinant DNA coding sequence is capable of driving 
expression of the DNA. Suitable cell types include those 
described above. 

5 The present invention also comprises an assay for a putative 
regulator of cell cycle progression which comprises: 

a. expressing in a cell (i) an E- DP transcription 
factor or a portion thereof sufficient to form a 
hetrodimer with an E2F transcription factor and 
10 (ii) an E2F transcription factor or portion 

thereof sufficient to form a heterodimer with the 
DP transcription factor or portion thereof and 
direct localisation of said heterodimer to the 
nucleus; and 

15 b. determining the degree of nuclear localization in 

the presence and absence of said putative 
regulator . 

The assay may be performed under conditions and within cell 
types as described above for the assay of an NLS-region 
2 0 regulator, and candidate regulators include those described 
above for the other assays of the invention. 

In this assay, a preferred DP transcription factor is DP-1, 
particularly mammalian DP-1, e.g. rodent or primate, e.g. 
human; The sequences of human and mouse DP-1 are shown in 
25 SEQ ID Nos. 10 and 11 respectively. A preferred E2F is E2F- 
1, particularly mammalian E2F-1 {SEQ ID No. 12), 
respectively e.g. rodent or primate, e.g. human. 



Where a portion of an E- DP transcription factor is used in 
such an assay, it may be of any size which is capable of 
30 forming a hetrodimer with an E2F transcription factor. 

Portions of from 40 to 400, preferably 60 to 200 amino acids 
may be made by routine recombinant DNA techniques and tested 
in systems analogous to those described above and below in 
the accompanying examples for their ability to function as 
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required. The portions of the DP protein will generally 
include substantially all or most of the domain found at 
amino acids 160 to 220 in DP-1 which is responsible for 
dimerisation with E2F-1. Where a portion of an E2F 
5 transcription factor .sufficient to form a heterodimer with 
the DP transcription factor is used, this may also be made 
and tested as described above for the portion of the DP 
factor, and preferably is within the same size ranges and 
also comprises substantially all or most of the 
10 heterodimerisation domain. 

The following examples illustrate the invention. 

Example 1: The proteins encoded by the spliced variants of 
DP-3 have distinct intracellular distributions. 

The DP-3 gene gives rise to a number of distinct proteins 
15 resulting from alternative splicing of its RNA (Ormondroyd 
, : et al . , 1995) . Since the DNA binding and transcription 

activation properties of the DP-3 variants, referred to as 
a, (3, y and 6, are not significantly different (Ormondroyd 
et al . , 1995) we considered that the variation within the 
20 DP-3 coding sequence may influence other properties of the 
proteins, such as their biochemical properties. We 
therefore compared the biochemical extraction properties of 
/? and 6, which constitute E- and E+ forms respectively, 
after sequential treatment with increasing salt 
25 concentration and monitoring the levels of protein extracted 
from transfected COS7 cells. 

COS7 cells were trasfected with plasmids carrying the full 
length coding sequences of DP-3 a, (3, y and 6 (Ormondroyd et 
al . , 1995) which were cloned into pG4mpoliII (Webster et 
30 al., 1989) under the control of .the SV40 early promoter. 
pG4DP-3orAE mutant was constructed by substituting a Bsgl 
fragment from DP-3/3 (E-minus) into DP-3a. A number of other 
vectors made in connection with other examples are descirbed 
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here for the sake of brevity: The lucif erase expression 
vector pGL-2 was supplied by Promega, and pGL-E vector 
derived from pGL-2 by an inframe insertion of a 54 bp Xbal 
fragment encoding the 16 amino acid residue E region in a 
5 single Xbal site irv ther "lucif erase coding region. To 
generate pGL-Eb, a PCR fragment was amplified using E5-X 
(5 ' GCTCTAGAGCCCAGTATAGA-3 ' (SEQ ID NO: 14)) and E3-X (5'- 
GCTCTAGATGTCTCAAGCCTTTCCC-3 ' (SEQ ID NO: 15)) as primers, 
pG4DP-3<* (Ormondroyd et al . , 1995) as the template and 

10 cloned into the single Xbal site in pGL-2. pG4 -DP- i has 
been already described (Bandara et al . , 1993) and pRcCMV- 
HAE2F1 (Krek et al., 1994), expressing HA-tagged human E2F-1 
was a gift of Dr W Krek. pCMV-DP-l/NLS was made by 
inserting a fragment containing the Bel 1 bi -partite NLS 

15 (amino acid residue 194 to 227) amplified by PCR into the 
Kpnl site (residue 327) of the DP-1 cDNA in pG4-DP-l. The 
nature of all the constructions were confirmed through 
sequence analysis . 

The cells were grown in Dulbecco's modified Eagle's medium 

2 0 supplemented with 10% foetal calf serum (FCS) . Cells were 

transfected by the liposome-mediated method, using the 
Lipofectin reagent (Gibco BRL) and according to 
manufacturer's recommendations. Sixty hours after 
transf ection, cells were lysed in ice cold low salt buffer 

25 (LSB; lOmM Tris-HCl pH 8, 7 . 5mM S04 (NH 4 ) 2 , ImM EDTA, 0.025% 
NP-4 0) by using 0.2 ml of LSB per 6 -cm-diameter dish. 
Lysates were incubated in ice for 5 rnin, and centrifuged at 
3000 rpm for 3 min. The resulting pellets were resuspended 
in 0.2 ml of high salt buffer {USE; 50mM Tris-HCl pH 8, 

30 150mM NaCl, 5mM EDTA, 0.5% NP-40) and centrifuged at 10,000 
rpm for 5 min. Both buffers, LSB and HSB, were supplemented 
with protease inhibitors and ImM dithiothreitol . The 
insoluble material contained in the pellets of the last 
centrifugation were resuspended in 0.2 ml of SDS-sample 

3 5 buffer. 
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Usually, about 5% of the different fractions was used in 
immunoblotting . Samples were, separated on a 10% SDS- 
polyacrylamide gel and transferred to nitrocellulose 
membranes. The membrane was blocked with 5% dried milk 
5 powder in PBS f or- I h, -anti-DP-3 antibody (1:200, rabbit 
serum) was added and incubated for additional 1 h at room 
temperature. After three washes in PBS with 0.2% Tween-20, 
the blot was incubated with alkaline phosphatase -conjugated 
goat anti-rabbit IgG {1:7500, Promega) for 1 h at room 
10 temperature, washed three times in PBS-0.2% Tween 20 and 
developed. Anti serum 7.5, raised against a peptide 
containing DEEDEEEDPSSPE (SEQ ID NO: 16) derived from DP- 3, 
was used in the immunoblotting experiments . 

The initial treatment with low salt (0.01M) releases mostly 
15 soluble cytoplasmic proteins, the high salt (0.5M) both 

nuclear and cytoplasmic, the insoluble material remaining 
being collected in fraction designated P. When cells 
expressing the (3 variant were treated according to this 
: regime and the levels of (3 monitored by immunoblotting, it 
; 20 was found to be present throughout the fractions, being 

moderately enriched in the low salt fraction. In contrast, 
when cells expressing 6 were treated in a similar fashion, 
the 6 protein was far more enriched in the P fraction. 
Thus, the extraction properties of /3 and 6 are different, 
25 and the E region (the only difference between {3 and 6 
proteins) is responsible for these differences. 

It was possible that the differences in biochemical 
properties reflected distinct intracellular distributions of 
the DP-3 proteins. To test this idea we expressed each of 

30 the variants in* COS7 cells and determined their 

intracellular location by immunostaining using anti-DP-3 
7.2, an antiserum useful for this purpose since it only 
recognises the exogenous DP-3 protein. For the 
immunofluorescences, cells were grown on coverslips in 3 cm 

35 diameter dishes. 
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When either the a, /3, y or 6 variant was expressed in COS7 
cells, their intracellular distribution fell into two 
distinct categories : a and 6 accumulated in nuclei whereas (3 
and 7 were distributed throughout the cytoplasm with a low 
5 level staining in nuclei:.. Although the a and 5 proteins 
were exclusively nuclear, within a transfected culture of 
asynchronous cells minor variation was apparent in the 
distribution of /3 and y proteins. For example, j3 and y 
were usually present . at higher levels in the cytoplasm 

10 relative to nuclei although occasional cells (less than 5% 
of transfected cells) were seen in which the proteins were 
present at similar levels in both the nucleus and the 
cytoplasm, a possible explanation for these observations 
being suggested later. In summary, these data establish 

15 that the differences in protein sequence between the 
variants influences their intracellular distribution. 
Specifically, the presence of the E regions in a and 6, but 
not (3 and y, correlates with the ability of the protein to 
efficiently accumulate in nuclei. 

20 The immunofluorescence was performed as follows. 

Transfected cells were fixed in 4% formaldehyde, rinsed and 
permeabilized in phosphate-buff ered saline (PBS) containing 
1% Triton X-100. Fixed cells were blocked in PBS containing 
1% FCS, incubated with the primary antibodies diluted in 

25 PBS-1% FCS for 3 0 min at room temperature, washed three 

times with PBS and incubated with the secondary antibodies 
diluted in PBS -10% FCS for 30 min at room temperature. 
After a final wash with PBS, the coverslips were mounted on 
slides using Citofluor and examined with a Zeiss microscope. 

30 Magnification was 630x unless otherwise indicated. 

As primary antibodies we used a rabbit polyclonal serum 
raised against a DP-3 specific peptide common to all the DP- 
3 variants called 7.2, a rabbit polyclonal serum which 
detects luciferase (Promega) , a DP-1 antiserum (098) raised 
35 against a C-terminal peptide in DP-1 and the anti-HA 

monoclonal antibody 12CA5 (BabCO) . Secondary antibodies 
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were goat arlti -rabbit IgG conjugated to fluorescein 
isothiocyanate (1:200, FITC) and goat anti-mouse IgG 
conjugated to tetramethylrhodamine isothiocyanate (1:200, 
TRITC) (Southern Biotechnology Associates Inc) . Anti- 
peptide serum 7 . 2 was raised against the sequence 
VALATGQLPASNSHQ (SEQ ID NO: 17) common to all DP-3 proteins 

Example 2: The E region is necessary for nuclear 
localization. 

Since the only difference between the (3 and 6 protein is the 
16 amino acid residue E region, the E region must be 
necessary for the nuclear accumulation of 6 . To test this 
idea, we removed the E region from the or variant (which like 
6 accumulates in nuclei) to create orAE, and compared the 
intracellular distribution of the mutated protein to that of 
15 wild-type a by immunofluorescence in transfected COS7 cells 
as described above. The results indicated that in the 
absence of the E region the intracellular distribution of 
orAE was altered to one which resembled the distribution of (3 
since it failed to efficiently accumulate in nuclei. These 
.20 data support the implications from the previous studies on a 
requirement for the E region in efficient nuclear 
accumulation, and thus suggest that it may function as or 
contribute to a nuclear localization signal (NLS) . 

Example 3: An extended E region functions as a nuclear 
25 localization signal. 

An NLS can be experimentally defined by its deletion causing 
a loss of nuclear accumulation or by transferring the 
phenotype to a non nuclear protein. The previous results 
indicate that the properties of the E region are compatible 
30 with the first statement. To address the second, we 

attached the E region or an extended E region containing an- 
additional 8 residues from the C-terminal boundary, onto 
luciferase (see Example 1 above for plasmid constructions) . 
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When expressed in COS7 cells, wild-type luciferase was 
distributed throughout the cell, being marginally more 
abundant within the cytoplasm; the protein had a very 
similar distribution in all cells expressing wild-type 
5 luciferase. The, insertion of the E region (pGL-E) did not 
significantly alter the distribution of the luciferase 
protein. However, when an additional 8 residues was 
inserted (pGL»-Eb) nuclear accumulation became far more 
efficient. Thus, the E region together with additional 
10 residues located further on from the C- terminal boundary is 
necessary for efficient nuclear accumulation. 

Together, these data suggest that the E region is necessary 
but not sufficient for the nuclear accumulation phenotype, 
and thus the 16 residue sequence is unlikely to contain an 

15 autonomous nuclear localization signal. Rather, the E 
region functions in a co-operative fashion with an 
additional part of the protein located at the C-terminal 
boundary of the E region to confer nuclear accumulation. In 
this respect, the insertion of the E region may produce a 

20 bi-partite nuclear localization signal characteristic of 
many eukaryotic nuclear proteins, such as nucleoplasmin 
(Dingwall and Laskey, 1991) . 

Example 4: The E region is encoded by an alternatively 
spliced exon. 

25 Although it was very likely that the presence of the E 
region is regulated by alternative splicing, it was not 
clear whether a discrete exon eincoded the 16 amino acid 
residues. To clarify this question we isolated the DP-3 
gene and characterised its genomic organization across the 

3 0 region encoding the E sequence. For this, a genomic library 
prepared from murine embryonic stem cells was screened with 
the DP-3 cDNA, positive clones isolated and thereafter the 
relationship between genomic and cDNA sequence established. 
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A AGEM12 genomic library prepared from embryonic stem cell 
line SV129D3 was plated (approximately 10 6 pfu) and 
transferred to Hybond N (Amersham International) . Filters 
were hybridised in QuikHyb solution (Stratagene) at 65oc 
5 with a 32 P labelled -. piqu.se DP-3a cDNA (Ormondroyd et a J , 

1995) . A positive genomic clone which contained the genomic 
E region was identified via southern blotting using a 
radiolabelled oligonucleotide antisense to the E region 
(358-407 bp DP - 3 a? ) . A genomic fragment containing the E 
10 exon was then cloned into pBluescript (pBS, Stratagene) and 
sequenced using a Sequenase version 2.0 kit (UBS). Oligo- 
nucleotides for PCR and sequencing were made from E+ mouse 
DP-3 cDNA sequences (Ormondroyd et al , 1995). 
Oligonucleotide sequences were as follows: 5' of E region, 
15 7.16S; 5' CACCCGCAATGGTCACT- 3 ' (SEQ ID NO: 18), 3' of E 

region, 7.17A; 5 ' - ATGTCTCAAGCCTTTCCC-3 ' (SEQ ID NO: 19), 5' 
end of E region El-S; 5 ' -GATAGAAAACGAGCTAGAG-3 ' (SEQ ID NO: 
20), 3' end of E region, E2-A; 5 ' -TTCTGAGAAATCAGAGTCTA- 
3' (SEQ ID NO: 21) . 

. 20 The analysis indicated that the 16 residues which constitute 
the E region are indeed encoded by a single. 48 bp exon. 
Conventional splice acceptor and donor sites exist for the 
boundaries of the E exon which, in turn, lead into two large 
introns and, subsequently, exon sequence encoding the 

2 5 surrounding DP-3 protein. This isolation and 

characterisation of the DP-3 gene indicated that the . E 
region is encoded by a discrete alternatively spliced exon. 
This is illustrated further in Figure 1. 

Example 5: DP-1 lacks an autonomous nuclear localization 

3 0 signal. 

A comparison of the E region of DP-3 with the same region of 
DP-1 indicated that DP-1 lacks a domain analogous to E 
(Ormondroyd et a2, 1995) . Furthermore, extensive searches 
to isolate alternatively spliced DP-1 mRNAs have so far 
35 failed and thus we investigated the intracellular location 
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of exogenous DP-1 when expressed in COS7 cells, using 
methods essentially as described above. 

The DP-1 protein had a similar distribution to the (3 and y 
(E- minus) forms of DP-3, since it was located throughout 
5 the cytoplasm with occasional low level staining in nuclei, 
such a result being entirely compatible with the absence of 
the E region. The absence of DP-1 in nuclei was due to the 
lack of a NLS since , the exogenous DP-1 could efficiently 
accumulate in nuclei after attaching a foreign nuclear 
10 localization signal (NLS) , the bi -partite signal taken from 
the Bel 1 protein (Chang et al . , 1995). These data suggest 
that DP-1 is not actively retained in the cytoplasm but 
rather its cytoplasmic location is passive. 

Example 6: E2F-1 can recruit DP-1 and cytoplasmic DP-3 
15 proteins to nuclei. 

■\ 

The result of Example 5 suggests that the cytoplasmic 
location of exogenous DP-1 is passive. We reasoned that in 
the absence of an autonomous NLS a possible mechanism to 
promote the nuclear accumulation of DP-1 may involve an 
20 interaction with its physiological partner, namely the E2F-1 
protein. To test this idea, we studied the location of the 
E2F-1 protein in COS7 cells and thereafter the effect of co- 
expressing E2F-1 and DP-1 in the same cells. 

An E2F-1 protein tagged at its N-terminal with a 
25 haemagglutinin (HA) epitope and visualised by immunostaining 
with an anti-HA monoclonal antibody was exclusively nuclear. 
To assess the influence of E2F-1 on DP-1, both proteins were 
co-expressed and their intracellular distribution determined 
by double immunostaining with ant i -HA monoclonal antibody 
30 and rabbit anti-DP-1. Neither the f luoresce in - congugated 
anti -rabbit immunoglobulin or rhodamine- congugated anti- 
mouse immunoglobulin cross-reacted with the anti-HA 
monoclonal antibody or the rabbit anti-DP-1 respectively. 
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There was a striking difference in the distribution of DP-1 
upon co-expression of E2F-1: cells expressing the E2F-1 
protein contained nuclear DP-1, in contrast to its 
cytoplasmic location in the absence of E2F-1. In the rare 
5 exceptions where the transfected cells expressed only DP-1 
(about 1% of total transfected population) the exogenous DP- 
1 was cytoplasmic. These data strongly suggest that upon 
forming a DP-1/E2F-1 heterodimer , E2F-1 has a dominant 
influence on recruiting DP-1 to a nuclear location. 

10 We assessed if E2F-1 had a similar effect on DP-3/3 and aAE . 
Co-expression of DP-3 /8 or a?AE with E2F-1 resulted in 
nuclear recruitment. The presence of DP-1 or DP-3/3 in 
nuclei is likely therefore to be dependent upon an 
interaction with the appropriate E2F heterodimeric partner 

15 which subsequently causes the efficient nuclear accumulation 
of DP proteins. 

Example 7: E2F-1 contains an NL»S . 

The abilitiy of E2F-1 to recruit DP-1 to the nucleus was 
investigated further to identify the E2F-1 NLS . Various 
20 experiments are used for this purpose. Deletion mutants of 
E2F-1 are made and are tested for their ability to recruit 
DP-1 to the nucleus. Experiments indicate that the NLS of 
E2F-1 (SEQ ID NO. 12) is located at residues 85-91. 

Discussion: Part A: Summary. 

2 5 The transport of macromolecules between the cytoplasm and 
nucleus is mediated in both directions by supramolecular 
structures which span the nuclear envelope called the 
nuclear pore complexes (NPCs) . Although small 
macromolecules (less than 40-60kD) can diffuse through NPCs, 

30 karyophillic proteins of any size are imported by a 

selective two-step mechanism which is energy dependent 
(Fabre and Hurt, 1994; Melchior and Gerace, 1995). Active 
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transport of proteins into the nucleus is dependent upon 
short stretches of amino acid residues, known as nuclear 
localization signals (NLS) and, although consensus NLS 
sequences have been difficult to define, they frequently 
5 consist of clusters' of -basic residues which may be 
continuous or bi-partite in nature (Dingwall and 
Laskey, 1991 ; Boulikas, 1993). 

Since transcription, factors exert their effects on gene 
expression within the nucleus, it is possible that their 

10 activity could be regulated through a control of 

intracellular location. Mechanisms have been described 
which influence nuclear accumulation in response to a 
specific signal, such as direct post- translational 
modification of the transcription factor, dissociation of an 

15 inhibitory subunit .which masks the NLS and interaction with 
a nuclear localizing protein (Whiteside and Goodbourn, 
1993) . Well documented examples occur in the NF-*B/Rel 
family of proteins, where proteolytic cleavage of a 
cytoplasmic precursor or an interaction with cytoplasmic IkB 

2 0 and related proteins controls nuclear accumulation of the 

functional transcription factor (Siebenlist et al . , 1995; 
Norris and Manley, 1995) . The glucocorticoid receptor is 
held in the cytoplasm by virtue of an interaction with heat 
shock protein 90, and hormone binding widely believed to 
25 promote nuclear entry by dissociating the receptor -hsp90 
complex (Evans, 1988) . In this study, we have documented 
for the first time mechanisms mediated at the level of 
intracellular location which influence the nuclear 
accumulation of the E2F heterodimer. 

3 0 Part B: An alternatively spliced nuclear localization signal 

in the E2F transcription factor. 

The E2F transcription factor plays an important role in 
integrating cell cycle progression with transcription 
(Nevins, 1992; La Thangue, 1994; Muller, 1995; Weinberg, 
35 1995) . In physiological E2F members of two distinct 
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families of proteins, DP and E2F, interact as DP/E2F 
heterodimers (Bandara et al . , 1993), with the functional 
consequences being co-operative DNA binding, pocket protein 
binding and transcriptional activation (Bandara et al . , 
5 1993; Helin et al,. y 1 9,93a ; Krek et al . , 1993). A number of 
different levels of control are known to be exerted upon the 
E2F heterodimer, such as binding and transcriptional 
repression by the pocket proteins (Helin et al . , 1993b; 
Flemington et al . , 1993), phosphorylation by cdk complexes 

10 (Krek et al . , 1994; 1995) and transcriptional activation by 
MDM2 oncoprotein (Martin et al . , 1995) . Here, we have 
described an additional mechanism of control in regulating 
the activity of E2F mediated at the level of intracellular 
location. Specifically, our data show that two alternative 

15 mechanisms exist which control the nuclear accumulation of 
the DP/E2F heterodimer regulated, firstly, by alternative 
splicing and, secondly, subunit composition of the 
heterodimer . 

These conclusions relate to previous observations made on 
20 the DP- 3 gene which encodes a number of discrete mRNAs that 
arise through alternative splicing. (Ormondroyd et al . , 
1995) . One of these processing events determines whether 
the E region is incorporated in the protein. Here, we show 
that the E region is encoded by an alternatively spliced 
25 exon which, together with an additional C-terminal 

extension, can confer efficient nuclear accumulation. The E 
region therefore contributes to a nuclear localization 
signal. 

Interestingly, comparison of the sequence of the sixteen 
30 amino acid residues within the E region to other previously 
defined NLSs suggests a closer resemblance to a bi -partite 
NLS rather than the NLS characteristic of SV4 0 large T 
antigen (Dingwall and Laskey, 1991) . Although there is some 
similarity to the SV40 large T antigen-like NLS, neither the 
35 sequence nor the functional properties of the E region 

completely satisfy the requirements for this type of NLS 
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(Boulikas, 1993/ 1994). For example, the consensus core 
sequence for an SV40 large T-like motif is likely to consist 
of at least four arginine and lysine residues, whereas the 
cluster within the E region consists of three basic 
5 residues. Secondly^ acinic residues are rarely included 
within the signal sequence, yet the E region cluster 
contains an aspartate residue embedded within it. 

Functional evidence -for this idea was obtained by 
determining if the E region is necessary and sufficient for 

10 nuclear accumulation. Although necessary in the context of 
wild-type DP-3 sequence, alone the E region was not 
sufficient to confer onto a non-nuclear resident efficient 
nuclear accumulation, but rather required an additional 
region located immediately C- terminal of the E region. This 

15 sequence, together with the cluster of basic residues within 
the E region, has a similar arrangement and characteristics 
for a bi -partite NLS namely, two basic clusters of amino 
acid residues separated by a spacer region (Dingwall and 
Laskey, 1991; La Casse and Lefebvre, 1995) . In the DP-3 

20 variants /? and y which lack in the E region, the N-terminal 
half of the bi-partite signal is removed by the splicing of 
the E exon. 

The role of alternative splicing as a mechanism for 
generating protein isoforms with different functional 

25 properties has been widely described. The inclusion of 
sequences which function as NLSs has been reported in 
several cases, such as in the nuclear mitotic apparatus 
(NuMA) protein (Tang et al . , 1994), CaM kinase (Srinivasan 
et al., 1994) and deoxynucleotidyl transferase (Bentolila et 

30 al . , 1995). Ah interesting situation occurs in the Max 

gene, which encodes a heterodimeric partner for Myc, where 
Max RNA is alternatively spliced to result in a Max protein 
truncated at the C- terminus and lacking an NLS (Makela et 
al., 1992). In contrast to wild-type Max, the truncated Max 

35 protein enhances the transformation activity of Myc (Makela 
et al; 1992). Nevertheless, a physiological splicing event 
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which regulates a bi -partite NL.S in such a fashion by 
removing one of the clusters of basic residues is, to our 
knowledge, novel. Thus, these data. define a previously 
unidentified level of control in the E2F transcription 
5 factor and could, more generally, indicate a new mechanism 
for regulating the activity of bi -partite NLSs through RNA 
processing. 

Although these data establish a dependence upon the E region 
for nuclear accumulation, they do not distinguish between 

10 the possibilities that the E region regulates nuclear entry 
or export. For example, it is possible that E- variants can 
enter and exit nuclei, and that the presence of the E region 
impedes nuclear export, resulting in a net nuclear 
accumulation. Such a possibility would be compatible with 

15 the altered biochemical extraction properties confired by 
the E region, which suggested that the E region may be 
involved in tethering to an insoluble nuclear structure. 
Interestingly, pRb is believed to be held in the nucleus by 
a tethering process, a property characteristic of the 

2 0 hypophosphorylated protein and thus potentially important in 
mediating physiological effects of cell cycle arrest 
(Mittnacht et al. ( 1991). 

Part C: Heterodimer formation between DP and E2F family 
members provides a mechanism for efficient nuclear 
25 accumulation. 

The DP- 3/3 and 7 variants fail to accumulate in nuclei when 
expressed in C0S7 cells, a phenotype which can now be 
directly attributed to the absence of the E region. The DP- 
1 protein, which lacks a region analogous to E (Girling et 
30 al, 1993; Ormondroyd et al , 1995), behaved in a fashion 

predicted for an E-DP variant since exogenous DP- 1 protein 
on COS7 cells had a similar location as the DP-3 E- 
variants. 
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The distribution of the E- DP variants, which are 
predominantly cytoplasmic, could result from one of several 
scenarios. For example, passive diffusion may occur such 
that at equilibrium the proteins are more abundant within 
5 the cytoplasm. Alternatively, the proteins may have a weak 
NLS which fails to efficiently target them to nuclei, a 
possibility consistent with the E- variants still possessing 
one half of the bi-partite NLS and observations made on the 
nucleoplasmin NLS where elimination of one half of the bi- 
10 partite signal does not completely abolish nuclear 

accumulation (Robbins et al . , 1991). Finally, it is also 
possible that the cytoplasmic pattern results from an active 
retention mechanism. However, this latter possibility is 
unlikely since a heterologous NLS was sufficient to confer a 
15 nuclear accumulation phenotype . 

We reasoned that there must be physiological mechanisms 
which promote the efficient nuclear accumulation of DP-l 
given that the endogenous DP-l is nuclear (data not shown) . 
We therefore tested whether formation of a DP/E2F 
2 0 heterodimer was involved in such a mechanism, experiments 

which indicated that co-expression of E2F-1 recruited E- DP 
proteins to nuclei, and thus heterodimerization with an 
appropriate E2F family member is likely to be sufficient to 
promote nuclear accumulation. Mechanistically, the nuclear 
25 accumulation of E- DP variants upon an interaction with E2F- 
1 may occur if E2F-1 is tethered within the nucleus and, 
upon interacting with DP variants, causes their retention in 
the nucleus. Alternatively, the interaction with E2F-1 may 
occur within the cytoplasm and the physical interaction with 
E2F-1 be responsible for delivering E- DP variants to the 
nucleus. Overall, these data suggest two distinct 
mechanisms for the nuclear accumulation of DP proteins, one 
dependent on the presence of an intrinsic sequence in the 
protein and the other on an interaction with the appropriate 
35 E2F partner. 



30 
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The fact that heterodimer formation can promote nuclear 
accumulation provides a likely explanation for the small 
proportion of COS7 cells which contain exogenous nuclear £ 
protein. We suggest in such cells that £ has a nuclear 
S location by virtue, of an interaction and heterodimer 
formation with endogenous E2F proteins. 

Part D: Physiological implications 

A mechanism through which nuclear accumulation is dependent 
upon heterodimerization has a number of important 

10 implications for the regulation of functional E2F 

transcription factor, that is, the DP/E2F heterodimer. For 
example, it would favour the presence of DP/E2F 
heterodimers , the physiological form involved in 
transcriptional activation (Bandara et al . , 1993; Helin et 

15 a2 . , 1993b; Krek et al . , 1993), in nuclei perhaps preventing 
some non-specific and/or undesirable interactions occurring. 
It may, in addition, provide a mechanism whereby the 
induction of nuclear DP/E2F heterodimers is dependent on a 
rate limiting E2F partner. Indeed, the expression of the 

20 E2F-1 gene is known to be under cell cycle control, in 

contrast to DP-1 which in some cell types is constitutively 
expressed during the cell cycle (Slansky et al . , 1993). In 
such a model, although DP-1 is expressed its contribution to 
transcriptional activation in the context of the DP/E2F 

25 heterodimer during the cell cycle will be strictly dependent 
upon the levels of E2F-1. 

We have established that the E region of DP proteins is 
required for nuclear accumulation, and that it likely 
functions as a bi-partite nuclear localization signal . 

30 Although this situation is novel, as yet we do have to 
understand the role that this mechanism plays in 
physiological E2F and the regulation of cell cycle 
progression. It is possible, we suggest, the E+ variants of 
DP proteins function in an analogous fashion as E2F-1 for 

35 DP-1 to recruit proteins capable of interacting with E+ 
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variants but which lack an autonomous nuclear localization 
signal . 

In conclusion, this study has highlighted a new and 
unexpected mechanism; of -control in regulating the activity 
5 of the E2F heterodimer. Specifically, nuclear accumulation 
is dramatically influenced by two distinct levels of 
control: alternative splicing of an exon which contributes 
to a nuclear localization signal and the subunit composition 
of the E2F heterodimer. It is likely that this control 
10 plays an important role in regulating the activity of the 
E2F transcription factor and hence cell cycle progression. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Medical Research Council 

(B) STREET: 20 Park Crescent 

(C) CITY: London 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP): WIN 4AL 

(A) NAME: La Thangue, Nicholas Barrie 

(B) STREET: Institute of Biomedical and Life Sciences, 

Davidson Building, University of Glasgow 

(C) CITY: Glasgow 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP): G12 8QQ 

(A) NAME: De La Luna, Susana 

(B) STREET: Institute of Biomedical and Life Sciences, 

Davidson Building, University of Glasgow 

(C) CITY: Glasgow 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP): G12 8QQ 

( ii ) TITLE OF INVENTION: DP and E2F protein nuclear localisation 
signals and their use 

(iii) NUMBER OF SEQUENCES: 21 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 

(v) CURRENT APPLICATION DATA: 

. APPLICATION NUMBER: PCT/GB97/01324 
(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: GB 9610195.1 

(B) FILING DATE: 15-MAY-1996 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1385 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(b) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1338 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ATG ACG GCA AAA AAT GTT GGT TTG CCA TCC ACA AAT GCA GAG CTG AGG 48 
Met Thr Ala Lys Asn Val Gly Leu Pro Ser Thr Asn Ala Glu Leu Arg 
1 5 10 15 
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GGC TTT ATA GAT CAG AAT TTC AGT CCA ACG AAA GGT AAC ATT TCA CTT 96 

Gly Phe lie Asp Gin Asn Phe Ser Pro Thr Lys Gly Asn He Ser Leu 

20 25 30 

GTT GCC TTT CCA GTT TCA AGC ACC AAC TCA CCA ACA AAG ATT TTA CCG 144 

Val Ala Phe Pro Val Ser Ser Thr Asn Ser Pro Thr Lys He Leu Pro 

35 40 45 

AAA ACC TTA GGG CCA ATA AAT GTG AAT GTT GGA CCC CAA ATG ATT ATA 192 

Lys Thr Leu Gly Pro He Asn Val Asn Val Gly Pro Gin Met He He 

50 55 60 



AGC ACA CCG CAG AG A ATT GCC AAT TCA GGA AGT GTT CTG ATT 
Ser Thr Pro Gin Arg He Ala Asn Ser Gly Ser Val Leu He 
65 70 75 * 80 



GGG AAT 2 40 

Gly Asn 



CCA TAT ACC CCT GCA CCC GCA ATG GTC ACT CAG ACT CAC ATA GCT GAG 268 

Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His lie Ala Glu 
85 90 95 

GCT GCT GGC TGG GTT CCC AGT GAT AG A AAA CGA GCT AGA GAA TTT ATA 336 

Ala Ala Gly Trp Val Pro Ser Asp Airg Lys Arg Ala Arg Glu Phe lie 
100 105 110 

GAC TCT GAT TTT TCA GAA AGT AAA CGA AGC AAA AAA GGA GAT AAA AAT 384 

Asp Ser Asp Phe Ser Glu Ser Lys Arg Ser Lys Lys Gly Asp Lys Asn 
115 120 125 

GGG AAA GGC TTG AGA CAT TTT TCA ATG AAG GTG TGT GAG AAA GTT CAG 432 

Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu Lys Val Gin 
130 135 140 



CGG AAA GGC ACA ACT TCA TAC AAT GAG GTA GCT GAT GAG CTG 
Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu Leu 
145 150 155 



GTA TCT 480 
Val Ser 
160 



GAG TTT ACC AAC TCA AAT AAC CAT CTG GCA GCT GAT TCG GCT TAT GAT 528 

Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser Ala Tyr Asp 

165 170 . ' 175 

CAG GAG AAC ATT AGA CGA AGA GTT TAT GAT GCT TTA AAT GTA CTA ATG 576 

Gin Glu Asn He Arg Arg Arg Val Tyr Asp Ala Leu Asn Val Leu Met 
180 185 190 

GCG ATG AAC ATA ATT TCA AAG GAA AAA AAA GAA ATC AAG TGG ATT GGC 624 

Ala Met Asn He He Ser Lys Glu Lys Lys Glu He Lys, Trp, He Gly 
195 200 205 



CTG CCT ACC AAT TCT GCT CAG GAA TGC CAG AAC CTG GAA ATC 
Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu He 
210 215 220 



GAG AAG 672 
Glu Lys 



CAG AGG CGG ATA GAA CGG ATA AAG CAG AAG CGA GCC CAG CTA CAA GAA 720 

Gin Arg Arg He Glu Arg He Lys Gin Lys Arg Ala Gin Leu Gin Glu 
225 230 235 240 

CTT CTC CTT CAG CAA ATT GCT TTT AAA AAC CTG GTA CAG AGA AAT CGA 768 

Leu Leu Leu Gin Gin He Ala Phe Lys Asn Leu Val Gin Arg Asn Arg 
245 250 255 

CAA AAT GAA CAA CAA AAC CAG GGC CCT CCA GCT GTG AAT TCC ACC ATT 816 

Gin Asn Glu Gin Gin Asn Gin Gly Pro Pro Ala Val Asn Ser Thr lie 

260 265 270 

CAG CTG CCA TTT ATA ATC ATT AAT ACA AGC AGG AAA ACA GTC ATA GAC 864 

Gin Leu Pro Phe He He He Asn Thr Ser Arg Lys Thr Val He Asp 
275 280 285 
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TGC AGC ATC TCC AGT GAC AAA TTT GAA TAC CTT TTT AAT 
Cys Ser lie Ser Ser Asp Lys Phe Glu Tyr Leu Phe Asn 
290 295 300 



TTT GAT AAC 
Phe Asp Asn 



912 



ACC TTT GAG ATC CAC GAC GAC ATA GAG GTA CTG AAG CGG ATG GGA ATG 
Thr Phe Glu lie His Asp Asp lie Glu Val Leu Lys Arg Met Gly Met 
305 310 315 320 



960 



TCC TTT GGT CTG GAG TCA GGC AAA TGC TCT CTG GAG GAT CTG AAA ATC 
Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu Asp Leu Lys lie 
325 330 335 



1008 



GCA AG A TCC CTG GTT CCA AAA GCT TTA GAA GGC TAT ATT ACA GAT ATC 
Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr lie Thr Asp lie 
340 345 350 



1056 



TCC ACA GGA CCT TCT TGG TTA AAT CAG GGA CTA CTT TTG AAC TCT ACC 
Ser Thr Gly Pro Ser Trp Leu Asn Gin Gly Leu Leu Leu Asn Ser Thr 
355 360 365 



1104 



CAA TCA GTT TCA AAT TTA GAC CCG ACC ACC GGT GCC ACT 
Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Gly Ala Thr 
370 375 380 



GTA CCC CAA 1152 
Val Pro Gin 



TCA AGT GTA AAC CAA GGG TTG TGC TTG GAT GCT GAA GTG 
Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu Val 
385 390 395 



GCC TTA GCA 
Ala Leu Ala 
400 



1200 



ACT GGG CAG CTC CCT GCC TCA AAC AGT CAC CAG TCC AGC 
Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser Ser 
405 410 



AGT GCA GCC 
Ser Ala Ala 
415 



1248 



TCT CAC TTC TCG GAG TCC CGC GGC GAG ACC CCC TGT TCA 
Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys Ser 
420 425 



TTC AAC GAT 
Phe Asn Asp 
430 



GAA GAT GAG GAA GAT GAA GAG GAG GAT CCC TCC TCC CCA GAA 
Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser Pro Glu 
435 440 445 

TAAAGACAGG AGAGAACTCA TGTTTTAAAA AAAAAAAAAA ACTCGAG 



1296 



1338 



1385 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 446 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Thr Ala Lys Asn Val Gly Leu Pro Ser Thr Asn Ala Glu Leu Arg 
1 5 10 15 

Gly Phe lie Asp Gin Asn Phe Ser Pro Thr Lys Gly Asn lie Ser Leu 
20 25 30 

Val Ala Phe Pro Val Ser Ser Thr Asn Ser Pro Thr Lys lie Leu Pro 
35 40 45 

Lys Thr Leu Gly Pro lie Asn Val Asn Val Gly Pro Gin Met lie lie 
50 55 60 
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Ser Thr Pro Gin Arg He Ala Asn Ser Gly Ser Val Leu He Gly Asn 
65 70 75 80 

Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His He Ala Glu 

85 90 95 

Ala Ala Gly Trp Val Pro Ser Asp Arg Lys Arg Ala Arg Glu Phe He 
100 ; 105 iio 

Asp Ser Asp Phe Ser Glu Ser Lys Arg Ser Lys Lys Gly Asp Lys Asn 
115 120 125 

Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu Lys Val Gin 
130 135 140 

Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu Leu Val Ser 
145 150 155 160 

Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser Ala Tyr Asp 
165 170 175 

Gin Glu Asn He Arg Arg Arg Val Tyr Asp Ala Leu Asn Val Leu Met 
180 185 190 

Ala Met Asn He lie Ser Lys Glu Lys Lys Glu lie Lys Trp He : Gly 
195 200 205 

Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu He Glu Lvs 
210 215 220 

Gin Arg Arg He Glu Arg lie Lys Gin Lys Arg Ala Gin Leu Gin Glu 
225 230 235 240 

Leu Leu Leu Gin Gin lie Ala Phe Lys Asn Leu Val Gin Arg Asn Ara 
245 250 255 

Gin Asn Glu Gin Gin Asn Gin Gly Pro Pro Ala Val Asn Ser Thr lie 
260 265 270 

Gin Leu Pro Phe He He lie Asn Thr Ser Arg Lys Thr Val He Asn 
27 5 280 285 

Cys Ser He Ser Ser Asp Lys Phe Glu Tyr Leu Phe Asn Phe Asp Asn 
290 295 300 

Thr Phe Glu He His Asp Asp lie Glu Val Leu Lys Arg Met Gly Met 
305 310 315 320 

Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu Asp Leu Lys He 
325 330 335 

Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr He Thr Asp He 
340 345 350 

Ser Thr Gly Pro Ser Trp Leu Asn Gin Gly Leu Leu Leu Ash Ser Thr 
355 360 365 

Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Giy Ala Thr Val Pro Gin 
370 375 380 

Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu Val Ala Leu Ala 
385 390 395 400 

Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser Ser Ser Ala Ala 
405 410 415 
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Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys Ser Phe Asn Asp 
420 425 430 

Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser Pro Glu 
435 440 * 445 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1154 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . . 1107 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATG ATT ATA AGC ACA CCG CAG AGA ATT GCC AAT TCA GGA AGT GTT CTG 48 
Met He He Ser Thr Pro Gin Arg lie Ala Asn Ser Gly Ser Val Leu 
1 5 10 15 

ATT GGG AAT CCA TAT ACC CCT GCA CCC GCA ATG GTC ACT CAG ACT CAC 96 
He Gly Asn Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His 
20 25 30 

ATA GCT GAG GCT GCT GGC TGG GTT CCC AGT AAA CGA AGC AAA AAA GGA 144 
lie Ala Glu Ala Ala Gly Trp Val Pro Ser Lys Arg Ser Lys Lys Gly 
35 40 45 

GAT AAA AAT GGG AAA GGC TTG AGA CAT TTT TCA ATG AAG GTG TGT GAG 192 
Asp Lys Asn Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu 
50 55 60 

AAA GTT CAG CGG AAA GGC ACA ACT TCA TAC AAT GAG GTA GCT GAT GAG 240 
Lys Val Gin Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu 
65 70 75 80 

CTG GTA TCT GAG TTT ACC AAC TCA AAT AAC CAT CTG GCA GCT GAT TCG 288 
Leu Val Ser Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser 
85 90 95 

GCT TAT GAT CAG GAG AAC ATT AGA CGA AGA GTT TAT GAT GCT TTA AAT 336 
Ala Tyr Asp Gin Glu Asn He Arg Arg Arg Val Tyr Asp Ala Leu Asn 
100 105 no 

GTA CTA ATG GCG ATG AAC ATA ATT TCA AAG GAA AAA AAA GAA ATC AAG 384 
Val Leu Met Ala Met Asn lie He Ser Lys Glu Lys Lys Glu He Lvs 
115 120 125 

TGG ATT GGC CTG CCT ACC AAT TCT GCT CAG GAA TGC CAG AAC CTG GAA 432 
Trp He Gly Leu Pro Thr Ash Ser Ala Gin Glu Cys Gin Asn Leu Glu 
130 135 140 

ATC GAG AAG CAG AGG CGG ATA GAA CGG ATA AAG CAG AAG CGA GCC CAG 480 
He Glu Lys Gin Arg Arg He Glu Arg He Lys Gin Lys Arg Ala Gin 
145 150 155 160 

CTA CAA GAA CTT CTC CTT CAG CAA ATT GCT TTT AAA AAC CTG GTA CAG 528 
Leu Gin Glu Leu Leu Leu Gin Gin He Ala Phe Lys Asn Leu Val Gin 
165 170 175 
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AG A AAT CGA CAA AAT GAA CAA CAA AAC CAG GGC CCT CCA GCT GTG AAT 576 
Arg Asn Arg Gin Asn Glu Gin Gin Asn Gin Gly 'Pro Pro Ala Val Asn 

1 o r\ IOC i 

i uu i. u j i^u 

TCC ACC ATT CAG CTG CCA TTT ATA ATC ATT AAT ACA AGC AGG AAA ACA 624 
Ser Thr lie Gin Leu Pro Phe lie lie lie Asn Thr Ser Arg Lys Thr 
195 ,200. 205 

GTC ATA GAC TGC AGC ATC TCC AGT GAC AAA TTT GAA TAC CTT TTT AAT 672 
Val lie Asp Cys Ser lie Ser Ser Asp Lys Phe Glu Tyr Leu Phe Asn 
210 215 - 220 

TTT GAT AAC ACC TTT GAG ATC CAC GAC GAC ATA GAG GTA CTG AAG CGG 720 
Phe Asp Asn Thr Phe Glu lie His Asp Asp lie Glu Val Leu Lys Arg 
225 230 235 240 

ATG GGA ATG TCC TTT GGT CTG GAG TCA GGC AAA TGC TCT CTG GAG GAT 768 
Met Gly Met Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu Asp 
245 250 255 

CTG AAA ATC GCA AGA TCC CTG GTT CCA AAA GCT TTA GAA GGC TAT ATT 816 
Leu Lys lie Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr lie 
260 265 270 

ACA GAT ATC TCC ACA GGA CCT TCT TGG TTA AAT CAG GGA CTA CTT TTG 864 
Thr Asp lie Ser Thr Gly Pro Ser Trp Leu Asn Gin Gly Leu Leu Leu 
275 280 v 285 

AAC TCT ACC CAA TCA GTT TCA AAT TTA GAC CCG ACC ACC GGT GCC ACT 912 
Asn Ser Thr Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Gly Ala Thr 
290 295 300 

GTA CCC CAA TCA AGT GTA AAC CAA GGG TTG TGC TTG GAT GCT GAA GTG 960 
Val Pro Gin Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu Val 
305 310 315 " 320 

GCC TTA GCA ACT GGG CAG CTC CCT GCC TCA AAC AGT CAC CAG TCC AGC 1008 
Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser Ser 
325 330 335 

AGT GCA GCC TCT CAC TTC TCG GAG TCC CGC GGC GAG ACC CCC TGT TCA 1056 
Ser Ala Ala Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys Ser 
340 345 350 

TTC AAC GAT GAA GAT GAG GAA GAT GAA GAG GAG GAT CCC TCC TCC CCA . 1104 
Phe Asn Asp Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser Pro 
355 360 365 

GAA TAAAGACAGG AGAGAACTCA TGTTTTAAAA AAAAAAAAAA ACTCGAG 1154 
Glu 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 369 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met lie lie Ser Thr Pro Gin Arg lie Ala Asn Ser Gly Ser Val Leu 
1 5 10 15 
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lie Gly Asn Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His 
20 25 30 

lie Ala Glu Ala Ala Gly Trp Val Pro Ser Lys Arg Ser Lys Lys Gly 
35 40 45 

Asp Lys Asn Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu 

50 - 55 • - 60 

Lys Val Gin Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu 
65 70 * 75 80 

Leu Val Ser Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser 
85 90 V 95 

Ala Tyr Asp Gin Glu Asn lie Arg Arg Arg Val Tyr Asp Ala Leu Asn 
100 105 " HO 

Val Leu Met Ala Met Asn lie lie Ser Lys Glu Lys Lys Glu lie Lys 
115 120 125 

Trp lie Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu 
130 135 140 

He Glu Lys Gin Arg Arg lie Glu Arg He Lys Gin Lys Arg Ala Gin 
145 150 155 160 

Leu Gin Glu Leu Leu Leu Gin Gin He Ala Phe Lys Asn Leu Val Gin 
165 170 175 

Arg Asn Arg Gin Asn Glu Gin Gin Asn Gin Gly Pro Pro Ala Val Asn 
180 185 190 

Ser Thr lie Gin Leu Pro Phe He He lie Asn Thr Ser Arg Lys Thr 
195 200 205 

Val lie Asp Cys Ser He Ser Ser Asp Lys Phe Glu Tyr Leu Phe Asn 
210 215 220 

Phe Asp Asn Thr Phe Glu He His Asp Asp lie Glu Val Leu Lys Arg 
225 230 ~ 235 240 

Met Gly Met Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu Asp 
245 250 255 

Leu Lys lie Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr He 
260 265 270 

Thr Asp He Ser Thr Gly Pro Ser Trp, Leu Asn Gin Gly Leu Leu Leu 
; 275 280 285 

Asn Ser Thr Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Gly Ala Thr 
290 295 300 

Val Pro Gin Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu Val 
305 310 315 320 

Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser Ser 
325 330 335 

Ser Ala Ala Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys Ser 
340 345 350 

Phe Asn Asp Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser Pro 
355 360 365 

Glu 
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(2) INFORMATION FOR SEO ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1157 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1.. 1110 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATG ATT ATA AGC ACA CCG CAG AG A ATT GCC AAT TCA GGA AGT GTT CTG 48 
Met lie lie Ser Thr Pro Gin Arg lie Ala Asn Ser Gly Ser Val Leu 
1 5 10 " 15 

ATT GGG AAT CCA TAT ACC CCT GCA CCC GCA ATG GTC ACT CAG ACT CAC 96 
lie Gly Asn Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His 
20 25 30 

ATA GCT GAG GCT GCT GGC TGG GTT CCC AGT AAA CGA AGC AAA AAA GGA 144 
lie Ala Glu Ala Ala Gly Trp Val Pro Ser Lys Arg Ser Lye Lys Gly 
35 40 45 

GAT AAA AAT GGG AAA GGC TTG AG A CAT TTT TCA ATG AAG GTG TGT GAG 192 
Asp Lys Asn Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu 
50 55 60 

AAA GTT CAG CGG AAA GGC ACA ACT TCA TAC AAT GAG' GTA GCT GAT GAG 240 
Lys Val Gin Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu 
65 70 7 5 80 

CTG GTA TCT GAG TTT ACC AAC TCA AAT AAC CAT CTG GCA GCT GAT TCG 288 
Leu Val Ser Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser 
85 90 95 

CAG GCT TAT GAT CAG GAG AAC ATT AG A CGA AG A GTT TAT GAT GCT TTA 336 
Gin Ala Tyr Asp Gin Glu Asn lie Arg Arg Arg Val Tyr Asp Ala Leu 
100 105 110 

AAT GTA CTA ATG GCG ATG AAC ATA ATT TCA AAG GAA AAA AAA GAA ATC 384 
Asn Val Leu Met Ala Met Asn lie lie Ser Lys Glu Lys Lys Glu lie 
115 120 125 

AAG TGG ATT GGC CTG CCT ACC AAT TCT GCT CAG GAA TGC CAG AAC CTG 432 
Lys Trp lie Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu 
• 130 135 140 

GAA ATC GAG AAG CAG AGG CGG ATA GAA CGG ATA AAG CAG AAG CGA GCC 480 
Glu lie Glu Lys Gin Arg Arg lie Glu Arg lie Lys Gin Lys Arg Ala 
145 150 155 160 

CAG CTA CAA GAA CTT CTC CTT CAG CAA ATT GCT TTT AAA AAC CTG GTA 528 
Gin Leu Gin Glu Leu Leu Leu Gin Gin lie Ala Phe Lys Asn Leu Val 
165 170 175 

CAG AGA AAT CGA CAA AAT GAA CAA CAA AAC CAG GGC CCT CCA GCT GTG 576 
Gin Arg Asn Arg Gin Asn Glu Gin Gin Ash Gin Gly Pro Pro Ala Val 
180 185 190 
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AAT TCC ACC ATT CAG CTG CCA TTT ATA ATC ATT AAT ACA AGC AGG AAA 624 
Asn Ser Thr He Gin Leu Pro Phe He He He Asn Thr Ser Arg Lys 
195 200 205 

ACA GTC ATA GAC TGC AGC ATC TCC AGT GAC AAA TTT GAA TAC CTT TTT 672 
Thr Val He Asp Cys Ser He Ser Ser Asp Lys Phe Glu Tyr Leu Phe 
210 215 220 

AAT TTT GAT AAC ACC TTT GAG A?C CAC GAC GAC ATA GAG GTA CTG AAG 720 
Asn Phe Asp Asn Thr Phe Glu He His Asp Asp He Glu Val Leu Lys 
225 230 235 240 

CGG ATG GGA ATG TCC TTT GGT CTG GAG TCA GGC AAA TGC TCT CTG GAG 768 
Arg Met Gly Met Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu 
245 250 255 

GAT CTG AAA ATC GCA AGA TCC CTG GTT CCA AAA GCT TTA GAA GGC TAT 816 
Asp Leu Lys He Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr 
260 265 270 

ATT ACA GAT ATC TCC ACA GGA CCT TCT TGG TTA AAT CAG GGA CTA CTT 864 
He Thr Asp lie Ser Thr Gly Pro Ser Trp Leu Asn Gin Gly Leu Leu 
275 280 285 

TTG AAC TCT ACC CAA TCA GTT TCA AAT TTA GAC CCG ACC ACC GGT GCC 912 
Leu Asn Ser Thr Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Gly Ala 
290 295 300 

ACT GTA CCC CAA TCA AGT GTA AAC CAA GGG TTG TGC TTG GAT GCT GAA 960 
Thr Val Pro Gin Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu 
305 310 315 320 

GTG GCC TTA GCA ACT GGG CAG CTC CCT GCC TCA AAC AGT CAC CAG TCC 1008 
Val Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser 
325 330 335 

AGC AGT GCA GCC TCT CAC TTC TCG GAG TCC CGC GGC GAG ACC CCC TGT 1056 
Ser Ser Ala Ala Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys 
340 345 350 

TCA TTC AAC GAT GAA GAT GAG GAA GAT GAA GAG GAG GAT CCC TCC TCC 1104 
Ser Phe Asn Asp Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser 
355 360 365 

CCA GAA TAAAGACAGG AGAGAACTCA TGTTTTAAAA AAAAAAAAAA ACTCGAG 1157 
Pro Glu 
370 

(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 370 amino acids 

( B) TYPE: amino acid 
{ D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met He He Ser Thr Pro Gin Arg He Ala Asn Ser Gly Ser Val Leu 
1 5 10 15 

He Gly Asn Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His 
20 25 30 

He Ala Glu Ala Ala Gly Trp Val Pro Ser Lys Arg Ser Lys Lys Gly 
35 40 45 
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Asp Lys Asn Gly Lye Gly Leu Arg His Phe Ser Met Lys Val Cys Glu 
50 55 60 

Lys Val Gin Arg Lys Gly -Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu 
65 70 75 80 

Leu Val Ser Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser 

85 90 95 

Gin Ala Tyr Asp Gin Glu Asn He Arg Arg Arg Val Tyr Asp Ala Leu 
100 105 no 

Asn Val Leu Met Ala Met Asn He He Ser Lys Glu Lys Lys Glu Tie 
"5 120 125 

Lys Trp He Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu 
130 135 140 

Glu He Glu Lys Gin Arg Arg He Glu Arg He Lys Gin Lys Arg Ala 
145 150 155 160 

Gin Leu Gin Glu Leu Leu Leu Gin Gin He Ala Phe Lys Asn Leu Val 
165 170 ' 175 

Gin Arg Asn Arg Gin Asn Glu Gin Gin Asn Gin Gly Pro Pro Ala Val 
180 185 190 

Asn Ser Thr lie Gin Leu Pro Phe He lie He Asn Thr Ser Arg Lys 
195 200 205 

Thr Val He Asp Cys Ser lie Ser Ser Asp Lys Phe Glu Tyr Leu Phe 
21° 215 220 

Asn Phe Asp Asn Thr Phe Glu lie His Asp Asp He Glu Val Leu Lvs 
225 230 235 2 40 

Arg Met Gly Met Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu 
245 250 255 

Asp Leu Lys lie Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr 
260 265 270 

He Thr Asp He Ser Thr Gly Pro Ser Trp Leu Asn Gin Gly Leu Leu 
275 280 285 

Leu Asn Ser Thr Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Gly Ala 
290 295 300 

Thr Val Pro Gin Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu 
305 310 315 320 

Val Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser 
325 330 335 

Ser Ser Ala Ala Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys 
340 345 350 

Ser Phe Asn Asp Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser 
355 360 365 

Pro Glu 
370 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1202 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

<B) LOCATION: 1..1155 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATG ATT ATA AGC ACA CCG CAG AGA ATT GCC AAT TCA GGA AGT GTT CTG 48 
Met lie lie Ser Thr Pro Gin Arg lie Ala Asn Ser Gly Ser Val Leu 
15 10 15 

ATT GGG AAT CCA TAT ACC CCT GCA CCC GCA ATG GTC ACT CAG ACT CAC 96 
lie Gly Asn Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His 
20 25 30 

ATA GCT GAG GCT GCT GGC TGG GTT CCC AGT GAT AGA AAA CGA GCT AGA 144 
lie Ala Glu Ala Ala Gly Trp Val Pro Ser Asp Arg Lys Arg Ala Arg 
35 40 45 % 

GAA TTT ATA G AC TCT GAT TTT TCA GAA AGT AAA CGA AGC AAA AAA GGA 192 
Glu Phe lie Asp Ser Asp Phe Ser Glu Ser Lys Arg Ser Lys Lys Gly 
50 55 60 

GAT AAA AAT GGG AAA GGC TTG AGA CAT TTT TCA ATG AAG GTG TGT GAG 240 
Asp Lys Asn Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu 
65 70 75 80 

AAA GTT CAG CGG AAA GGC ACA ACT TCA TAC AAT GAG GTA GCT GAT GAG 288 
Lys Val Gin Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu 
85 90 95 

CTG GTA TCT GAG TTT ACC AAC TCA AAT AAC CAT CTG GCA GCT GAT TCG 336 
Leu Val Ser Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser 
100 105 HO 

GCT TAT GAT CAG GAG AAC ATT AGA CGA AGA GTT TAT GAT GCT TTA AAT 384 
Ala Tyr Asp Gin Glu Asn lie Arg Arg Arg Val Tyr Asp Ala Leu Asn 
115 120 125 

GTA CTA ATG GCG ATG AAC ATA ATT TCA AAG GAA AAA AAA GAA ATC AAG 432 
Val Leu Met Ala Met Asn He He Ser Lys Glu Lys Lys Glu He Lys 
130 135 140 

TGG ATT GGC CTG CCT ACC AAT TCT GCT CAG GAA TGC CAG AAC CTG GAA 480 
Trp He Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu 
145 150 155 160 

ATC GAG AAG CAG AGG CGG ATA GAA CGG ATA AAG CAG AAG CGA GCC CAG 528 
He Glu Lys Gin Arg Arg He Glu Arg He Lys Gin Lys Arg Ala Gin 
165 170 175 

CTA CAA GAA CTT CTC CTT CAG CAA ATT GCT TTT AAA AAC CTG GTA CAG 576 
Leu Gin Glu Leu Leu Leu Gin Gin He Ala Phe Lys Asn Leu Val Gin 
180 185 190 
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AGA AAT CGA CAA AAT GAA CAA CAA AAC CAG GGC CCT 
Arg Asn Arg Gin Asn Glu Gin Gin Asn Gin Gly Pro 
195 200 



CCA GCT GTG AAT 
Pro Ala Val Asn 
205 



624 



TCC ACC ATT CAG CTG CCA TTT ATA ATC ATT AAT ACA 
Ser Thr He Gin Leu Pro Phe He He He Asn Thr 
210 215 220 

GTC ATA GAC TGC AGC ATC TCC AGT GAC AAA TTT GAA 
Val He Asp Cys Ser He Ser Ser Asp Lys Phe Glu 
225 230 235 

TTT GAT AAC ACC TTT GAG ATC CAC GAC GAC ATA GAG 
Phe Asp Asn Thr Phe Glu He His Asp Asp lie Glu 
245 250 

ATG GGA ATG TCC TTT GGT CTG GAG TCA GGC AAA TGC 
Met Gly Met Ser Phe Gly Leu Glu Ser Gly Lys Cys 
260 265 

CTG AAA ATC GCA AGA TCC CTG GTT CCA AAA GCT TTA 
Leu Lys He Ala Arg Ser Leu Val Pro Lys Ala Leu 
275 280 

ACA GAT ATC TCC ACA GGA CCT TCT TGG TTA AAT CAG 
Thr Asp He Ser Thr Gly Pro Ser Trp Leu Asn Gin 
290 295 300 

AAC TCT ACC CAA .TCA GTT TCA AAT TTA GAC CCG ACC 
Asn Ser Thr Gin Ser Val Ser Asn Leu Asp Pro Thr 
305 310 315 

GTA CCC CAA TCA AGT GTA AAC CAA GGG TTG TGC TTG 
Val Pro Gin Ser Ser Val Asn Gin Gly Leu Cys Leu 
325 330 

GCC TTA GCA ACT GGG CAG CTC CCT GCC TCA AAC AGT 
Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser 
340 ' 345 

AGT GCA GCC TCT CAC TTC TCG GAG TCC CGC GGC GAG 
Ser Ala Ala Ser His Phe Ser Glu Ser Arg Gly Glu 
355 360 



AGC AGG AAA ACA 
Ser Arg Lys Thr 



TAC CTT TTT AAT 
Tyr Leu Phe Asn 
240 

GTA CTG AAG CGG 
Val Leu Lys Arg 
255 

TCT CTG GAG GAT 
Ser Leu Glu Asp 
270 

GAA GGC TAT ATT 
Glu Gly Tyr lie 
285 

GGA CTA CTT TTG 
Gly Leu Leu Leu 



ACC GGT GCC ACT 
Thr Gly Ala Thr 
320 

GAT GCT GAA GTG 
Asp Ala Glu Val 
335 

CAC CAG TCC AGC 
His Gin Ser Ser 
350 

ACC CCC TGT TCA 
Thr Pro Cys Ser 
365 



672 



720 



768 



816 



864 



912 



960 



1008 



1056 



1104 



TTC AAC GAT GAA GAT GAG GAA GAT GAA GAG GAG GAT 
Phe Asn Asp Glu Asp Glu Glu Asp Glu Glu Glu Asp 
370 375 380 



CCC TCC TCC CCA 
Pro Ser Ser Pro 



1152 



GAA 
Glu 
385 



TAAAGACAGG AGAGAACTCA TGTTTTAAAA AAAAAAAAAA ACTCGAG 



1202 



(2) INFORMATION FOR SEQ ID NO:8:. 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 385 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

Met He He Ser Thr Pro Gin Arg He Ala Asn Ser Gly Ser Val Leu 
1 5 10 15 
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lie Gly Asn Pro Tyr Thr Pro Ala Pro Ala Met Val Thr Gin Thr His 
20 25 30 

He Ala Glu Ala Ala Gly Trp Val Pro Ser Asp Arg Lys Arg Ala Arg 
35 40 45 

Glu Phe He Asp Ser Asp Phe Ser Glu Ser Lys Arg Ser Lys Lys Gly 

50 5 5 - 60 

Asp Lys Asn Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu 
65 70 75 80 

Lys Val Gin Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu 
85 90 95 

Leu Val Ser Glu Phe Thr Asn Ser Asn Asn His Leu Ala Ala Asp Ser 
100 105 110 

Ala Tyr Asp Gin Glu Asn He Arg Arg Arg Val Tyr Asp Ala Leu Asn 
115 120 125 

Val Leu Met Ala Met Asn He He Ser Lys Glu Lys Lys Glu He Lys 
130 135 140 

Trp lie Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu 
145 ** 150 155 160 

He Glu Lys Gin Arg Arg He Glu Arg lie Lys Gin Lys Arg Ala Gin 
165 170 175 

Leu Gin Glu Leu Leu Leu Gin Gin He Ala Phe Lys Asn Leu Val Gin 
180 185 190 

Arg Asn Arg Gin Asn Glu Gin Gin Asn Gin Gly Pro Pro Ala Val Asn 
195 200 205 

Ser Thr He Gin Leu Pro Phe He He He Asn Thr Ser Arg Lys Thr 
210 215 220 

Val He Asp Cys Ser He Ser Ser Asp Lys Phe Glu Tvr Leu Phe Asn 
225 * 230 235 " 240 

Phe Asp Asn Thr Phe Glu lie His Asp Asp He Glu Val Leu Lys Arg 
245 250 255 

Met Gly Met Ser Phe Gly Leu Glu Ser Gly Lys Cys Ser Leu Glu Asp 
260 265 270 

Leu Lys He Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Gly Tyr He 
275 280 2 85 

Thr Asp He Ser Thr Gly Pro Ser Trp Leu Asn Gin Gly Leu Leu Leu 
290 295 300 

Asn Ser Thr Gin Ser Val Ser Asn Leu Asp Pro Thr Thr Gly Ala Thr 
305 310 315 320 

Val Pro Gin Ser Ser Val Asn Gin Gly Leu Cys Leu Asp Ala Glu Val 
325 330 335 

Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin Ser Ser 
340 345 350 

Ser Ala Ala Ser His Phe Ser Glu Ser Arg Gly Glu Thr Pro Cys Ser 
355 360 365 
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Phe Asn Asp Glu Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser Pro 
370 375 380 

Glu 
385 



(2) INFORMATION FOR SEQ ID NO: 9: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 16 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Ser Asp Arg Lys Arg Ala Arg Glu Phe lie Asp Ser Asp Phe Ser Glu 
1 5 10 I5 

(2) INFORMATION FOR SEQ ID NO:10: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 410 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

< ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

Met Ala Lys Asp Ala Gly Leu He Glu Ala Asn Gly Glu Leu Lys Val 
1 5 10 ' 15 

Phe He Asp Gin Asn Leu Ser Pro Gly Lys Gly Val Val Ser Leu Val 
20 25 30 

Ala Val His Pro Ser Thr Val Asn Pro Leu Gly Lys Gin Leu Leu Pro 
35 40 45 

Lys Thr Phe Gly Gin Ser Asn Val Asn He Ala Gin Gin Val Val He 
SO 55 60 

Gly Thr Pro Gin Arg Pro Ala Ala Ser Asn Thr Leu Val Val Glv Ser 
65 . 70 75 80 

Pro His Thr Pro Ser Thr His Phe Ala Ser Gin Asn Gin Pro Ser Asp 
85 90 95 

Ser Ser Pro Trp Ser Ala Gly Lys Arg Asn Arg Lys Gly Glu Lys Asn 
100 105 no 

Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu Lys Val Gin 
115 120 125 

Arg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu Leu Val Ala 
130 135 140 
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Glu Phe Ser Ala Ala Asp Asn His lie Leu Pro Asn Glu Ser Ala Tyr 
145 150 155 160 

Asp Gin Lys Asn lie Arg Arg Arg Val Tyr Asp Ala Leu Asn Val Leu 
165 170 175 

Met Ala Met Asn lie He Ser Lys Glu Lys Lys Glu He Lys Trp He 
180, 185 190 

Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu Val Glu 
195 200 205 

Arg Gin Arg Arg Leu Glu Arg He Lys Gin Lys Gin Ser Gin Leu Gin 
210 215 220 

Glu Leu He Leu Gin Gin He Ala Phe Lys Asn Leu Val Gin Arg Asn 
225 230 235 240 

Arg His Ala Glu Gin Gin Ala Ser Arg Pro Pro Pro Pro Asn Ser Val 
245 250 255 

He His Leu Pro Phe He He Val Asn Thr Ser Lys Lys Thr Val He 
260 265 270 

Asp Cys Ser He Ser Asn Asp Lys Phe Glu Tyr Leu Phe Asn Phe Asp 
275 280 285 

Asn Thr Phe Glu He His Asp Asp He Glu Val Leu Lys Arg Met Gly 
290 295 300 

Met Ala Cys Gly Leu Glu Ser Gly Ser Cys Ser Ala Glu Asp Leu Lys 
305 310 315 320 

Met Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Pro Tyr Val Thr Glu 
325 330 " 335 

Met Ala Gin Gly Thr Val Gly Gly Val Phe He Thr Thr ( Ala Gly Ser 
340 345 350 

Thr Ser Asn Gly Thr Arg Phe Ser Ala Ser Asp Leu Thr Asn Gly Ala 
355 360 365 

Asp Gly Met Leu Ala Thr Ser Ser Asn Gly Ser Gin Tyr Ser Gly Ser 
370 375 380 

Arg Val Glu Thr Pro Val Ser Tyr Val Gly Glu Asp Asp Glu Glu Asp 
385 390 395 400 

Asp Asp Phe Asn Glu Asn Asp Glu Asp Asp 
405 410 

(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 410 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ala Lys Asp Ala Ser Leu He Glu Ala Asn Gly Glu Leu Lys Val 
1 5 10 15 
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Phe lie Asp Gin Asn Leu Ser Pro Gly Lys Gly Val Val Ser Leu Val 
20 25 30 

Ala Val His Pro Ser Thr Val Asn Thr Leu Gly Lys Gin Leu Leu Pro 
35 40 45 

Lye Thr Phe Gly Gin Ser Asn Val Asn lie Thr Gin Gin Val Val lie 

50 - 55' 60 

Gly Thr Pro Gin Arg Pro Ala Ala Ser Asn Thr lie Val Val Gly Ser 
65 70 75 80 

Pro His Thr Pro Asn Thr His Phe Val Ser Gin Asn Gin Thr Ser Asp 
85 90 95 

Ser Ser Pro Trp Ser Ala Gly Lys Arg Asn Arg Lys Gly Glu Lys Asn 
100 105 ~ no 

Gly Lys Gly Leu Arg His Phe Ser Met Lys Val Cys Glu Lys Val Gin 
115 120 125 

Airg Lys Gly Thr Thr Ser Tyr Asn Glu Val Ala Asp Glu Leu Val Ala 
130 135 140 

Glu Phe Ser Ala Ala Asp Asn His He Leu Pro Asn Glu Ser Ala Tyr 
145 150 155 160 

Asp Gin Lys Asn lie Arg Arg Arg Val Tyr Asp Ala Leu Asn Val Leu 
165 170 175 

Met Ala Met Asn He lie Ser Lys Glu Lys Lys Glu He Lys Trp He 
180 185 190 

Gly Leu Pro Thr Asn Ser Ala Gin Glu Cys Gin Asn Leu Glu Val Glu 
195 200 205 

Arg Gin Arg Arg Leu Glu Arg lie Lys Gin Lys Gin Ser Gin Leu Gin 
210 215 220 

Glu Leu He Leu Gin Gin lie Ala Phe Lys Asn Leu Val Gin Arg Asn 
225 230 235 240 

Arg Gin Ala Glu Gin Gin Ala Arg Arg Pro Pro Pro Pro Asn Ser Val 
245 250 255 

lie His Leu Pro Phe He lie Val Asn Thr Ser Arg Lys Thr Val He 
260 „ 265 270 

Asp Cys Ser He Ser Ash Asp Lys Phe Glu Tyr Leu Phe Asn Phe Asp 
275 280 285 

Asn Thr Phe Glu He His Asp Asp lie Glu Val Leu Lys Arg Met Gly 
290 295 300 

Met Ala Cys Gly Leu Glu Ser Gly Asn Cys Ser Ala Glu Asp Leu Lys 
305 310 315 320 

Val Ala Arg Ser Leu Val Pro Lys Ala Leu Glu Pro Tyr Val Thr Glu 
325 330 335 

Met Ala Gin Gly Ser He Gly Gly Val Phe Val Thr Thr Thr Gly Ser 
340 345 350 

Thr Ser Asn Gly Thr Arg Leu Ser Ala Ser Asp Leu Ser Asn Gly Ala 
355 360 365 
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Asp Gly Met Leu Ala Thr Ser Ser Asn Gly Ser Gin Tyr Ser Gly Ser 
370 375 ' 380 

Arg Val Glu Thr Pro Val Ser Tyr Val Gly Glu Asp Asp Asp Asp Asp 
385 390 395 400 

Asp Asp Phe Asn Glu Asn Asp Glu Glu Asp 

. 405 ' - 410 

(2) INFORMATION FOR SEQ ID NO: 12: 

(X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2457 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

( ix ) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 87.. 1397 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GGGATCGAGC CCTCGCCGAG GCCTGCCGCC ATGGGCCCGC GCCGCCGCCG CCGCCTGTCA 60 

CCCGGGCCGC GCGGGCCGTG AGCGTC ATG GCC TTG GCC GGG GCC CCT GCG GGC 113 

Met Ala Leu Ala Gly Ala Pro Ala Gly 
1 5 

GGC CCA TGC GCG CCG GCG CTG GAG GCC CTG CTC GGG GCC GGC GCG CTG 161 
Gly Pro cys Ala Pro Ala Leu Glu Ala Leu Leu Gly Ala Gly Ala Leu 
10 15 20 25 

CGG CTG CTC GAC TCC .TCG CAG ATC GTC ATC ATC TCC GCC GCG CAG , GAC 209 
Arg Leu Leu Asp Ser Ser Gin lie Val lie lie Ser Ala Ala Gin Asp 
30 35 40 

GCC AGC GCC CCG CCG GCT CCC ACC GGC CCC GCG GCG CCC GCC GCC GGC 2 57 

Ala Ser Ala Pro Pro Ala Pro Thr Gly Pro Ala Ala Pro Ala Ala Gly 
45 50 55 

CCC TGC GAC CCT GAC CTG CTG CTC TTC GCC ACA CCG CAG GCG. CCC CGG 305 
Pro Cys Asp Pro Asp Leu Leu Leu Phe Ala Thr Pro Gin Ala Pro Arg 
60 65 70 

CCC ACA CCC AGT GCG CCG CGG CCC GCG CTC GGC CGC CCG CCG GTG AAG 353 
Pro Thr Pro Ser Ala Pro Arg Pro Ala Leu Gly Arg Pro Pro Val Lys 
75 80 85 

CGG AGG CTG GAC CTG GAA ACT GAC CAT CAG TAC CTG GCC GAG AGC AGT 401 
Arg Arg Leu Asp Leu Glu Thr Asp His Gin Tyr Leu Ala Glu Ser Ser 
90 95 100 105 

GGG CCA GCT CGG GGC AGA GGC CGC CAT CCA GGA AAA GGT GTG AAA TCC 449 
Gly Pro Ala Arg Gly Arg Gly Arg Hie Pro Gly Lys Gly Val Lys Ser 
110 115 120 

CCG GGG GAG AAG TCA CGC TAT GAG ACC TCA CTG AAT CTG ACC ACC AAG 497 
Pro Gly Glu Lys Ser Arg Tyr Glu Thr Ser Leu Asn Leu Thr Thr Lys 
125 130 135 

CGC TTC CTG GAG CTG CTG AGC CAC TCG GCT GAC GGT GTC GTC GAC CTG 545 
Arg Phe Leu Glu Leu Leu Ser His Ser Ala Asp Gly Val Val Asp Leu 
140 145 150 
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AAC TGG GCT GCC GAG GTG CTG AAG GTG CAG AAG CGG CGC ATC TAT GAC 593 
Asn Trp Ala Ala Glu Val Leu Lys Val Gin Lys Arg Arg lie Tyr Asp 
155 160 i&5 

ATC ACC AAC GTC CTT GAG GGC ATC CAG CTC ATT GCC AAG AAG TCC AAG 641 
He Thr Asn Val Leu Glu Gly He Gin Leu He Ala Lys Lys Ser Lys 
170 175 . 180 185 

AAC CAC ATC CAG TGG CTG GGC AGC CAC ACC ACA GTG GGC GTC GGC GGA 689 
Asn His He Gin Trp Leu Gly Ser His Thr Thr Val Gly Val Gly Gly 
190 195 * 200 

CGG CTT GAG GGG TTG ACC CAG GAC CTC CGA CAG CTG CAG GAG AGC GAG 737 
Arg Leu Glu Gly Leu Thr Gin Asp Leu Arg Gin Leu Gin Glu Ser Glu 
205 210 215 

CAG CAG CTG GAC CAC CTG ATG AAT ATC TGT ACT ACG CAG CTG CGC CTG 785 
Gin Gin Leu Asp His Leu Met Asn He Cys Thr Thr Gin Leu Arg Leu 
220 225 230 

CTC TCC GAG GAC ACT GAC AGC CAG CGC CTG GCC TAC GTG ACG TGT CAG 833 
Leu Ser Glu Asp Thr Asp Ser Gin Arg Leu Ala Tyr Val Thr Cys Gin 
235 240 245 

GAC CTT CGT AGC ATT GCA GAC CCT GCA GAG CAG ATG GTT ATG GTG ATC 881 
Asp Leu Arg Ser He Ala Asp Pro Ala Glu Gin Met Val Met Val lie 
250 255 260 265 

AAA GCC CCT CCT GAG ACC CAG CTC CAA GCC GTG GAC TCT TCG GAG AAC 929 
Lys Ala Pro Pro Glu Thr Gin Leu Gin Ala Val Asp Ser Ser Glu Asn 
270 275 280 

TTT CAG ATC TCC CTT AAG AGC AAA CAA GGC CCG ATC GAT GTT TTC CTG 977 
Phe Gin He Ser Leu Lys Ser Lys Gin Gly Pro He Asp Val Phe Leu 
285 290 295 

TGC CCT GAG GAG ACC GTA GGT GGG ATC AGC CCT GGG AAG ACC CCA TCC 1025 
Cys Pro Glu Glu Thr Val Gly Gly He Ser Pro Gly Lys Thr Pro Ser 
300 305 310 

CAG GAG GTC ACT TCT GAG GAG GAG AAC AGG GCC ACT GAC TCT GCC ACC 1073 
Gin Glu Val Thr Ser Glu Glu Glu Asn Arg Ala Thr Asp Ser Ala Thr 
315 320 "* 325 

ATA GTG TCA CCA CCA CCA TCA TCT CCC CCC TCA TCC CTC ACC ACA GAT 1121 
He Val Ser Pro Pro Pro Ser Ser Pro Pro Ser Ser Leu Thr, Thr Asp 
330 335 340 345 

CCC AGC CAG TCT CTA CTC AGC CTG GAG CAA GAA CCG CTG TTG TCC CGG 1169 
Pro Ser Gin Ser Leu Leu Ser Leu Glu Gin Glu Pro Leu Leu Ser Arg 
350 355 360 

ATG GGC AGC CTG CGG GCT CCC GTG GAC GAG GAC CGC CTG TCC CCG CTG 1217 
Met Gly Ser Leu Arg Ala Pro Val Asp Glu Asp Arg Leu Ser Pro Leu 
365 370 375 

GTG CCG GCC GAC TCG CTC CTG GAG CAT GTG CGG GAG GAC TTC TCC GGC 1265 
Val Ala Ala Asp Ser Leu Leu Glu His Val Arg Glu Asp Phe Ser Glv 
380 385 390 

CTC CTC CCT GAG GAG TTC ATC AGC CTT TCC CCA CCC CAC GAG GCC CTC 1313 
Leu Leu Pro Glu Glu Phe He Ser Leu Ser Pro Pro His Glu Ala Leu 
395 400 405 

GAC TAC CAC TTC GGC CTC GAG GAG GGC GAG GGC ATC AGA GAC CTC TTC 1361 
Asp Tyr His Phe Gly Leu Glu Glu Gly Glu Gly lie Arg Asp Leu Phe 
410 415 420 425 
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GAC TGT GAC TTT GGG GAC CTC ACC CCC CTG GAT TTC TGACAGGGCT 1407 
Asp Cys Asp Phe Gly Asp Leu Thr Pro Leu Asp Phe 
430 435 



TGGAGGG ACC 


AGGGTTTCCA 


G AG TAG CT C A 


CCTTGTCTCT 


GCAGCCCTGG 


AGCCCCCTGT 


1467 


CCCTGGCCGT 


CCTCCCAGCC 


TGTTTGGAAA 


CATTTAATTT 


ATACCCCTCT 


CCTCTGTCTC 


1527 


CAGAAGCTTC 


TAGCTCTGGG 


GTCTGGCTAC 


CGCTAGGAGG 


CTGAGCAAGC 


CAGGAAGGGA 


1587 


AGGAGTCTGT 


GTGGTGTGTA 


TGTGCATGCA 


GCCTACACCC 


ACACGTGTGT 


ACCGGGGGTG 


1647 


AATGTGTGTG 


AGCATGTGTG 


TGTGCATGTA 


CCGGGGAATG 


AAGGTGAACA 


TACACCTCTG 


1707 


TGTGTGCACT 


GCAGACACGC 


CCCAGTG TGT 


CCACATGTGT 


GTGCATGAGT 


CCATCTCTGC 


1767 


GCGTGGGGGG 


GCTCTAACTG 


CACTTTCGGC 


CCTTTTG CTC 


GTGGGGTCCC 


ACAAGG CCCA 


1827 


GGGCAGTGCC 


TGCTCCCAGA 


ATCTGGTGCT 


CTGACCAGGC 


CAGGTGGGGA 


GGCTTTGGCT 


1887 


GGCTGGGCGT 


GTAGGACGGT 


GAG AG CACTT 


CTGTCTTAAA 


GGTTTTTTCT 


GATTGAAGCT 


1947 


TTAATGGAGC 


GTTATTTATT 


TATCGAGGCC 


TCTTTGGTGA 


GCCTGGGGAA TCAGCAAAAG 


2007 


GGGAGGAGGG 


GTGTGGGGTT 


GATACCCCAA 


CTCCCTCTAC 


CCTTGAGCAA 


GGGCAGGGGT 


2067 


CCCTGAGCTG 


TTCTTCTGCC 


CCATACTGAA 


GGAACTGAGG 


CCTGGGTGAT 


TTATTTATTG 


2127 


GGAAAGTGAG 


GGAGGGAGAC 


AGACTGACTG 


ACAGCCATGG 


GTGGTCAGAT 


GGTGGGGTGG 


2187 


GCCCTCTCCA 


GGGGGCCAGT 


TCAGGGCCCA 


GCTGCCCCCC 


AGGATGGATA 


TGAGATGGGA 


2247 


GAGGTGAGTG 


GGGGACCTTC 


ACTGATGTGG 


GCAGGAGGGG 


TGGTGAAGGC 


CTCCCCCAGC 


2307 


CCAGACCCTG 


TGGTCCCTCC 


TGCAGTGTCT 


GAAGCGCCTG 


CCTCCCCACT 


GCTCTGCCCC 


2367 


ACCCTCCAAT 


CTG C ACTTTG 


ATTTGCTTCC 


TAACAG CTCT 


GTTCCCTCCT 


GCTTTGGTTT 


2427 


TAATAAATAT 


TTTGATGACG 


TTAAAAAAAA 








2457 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 437 amino acids 
(B> TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

Met: Ala Leu Ala Gly Ala Pro Ala Gly Gly Pro Cys Ala Pro Ala Leu 
1 5 10 15 

Glu Ala Leu Leu Gly Ala Gly Ala Leu Arg Leu Leu Asp Ser Ser Gin 
20 25 30 

lie Val lie lie Ser Ala Ala Gin Asp Ala Ser Ala Pro Pro Ala Pro V 
35 40 45 

Thr Gly Pro Ala Ala Pro Ala Ala Gly Pro Cys Asp Pro Asp Leu Leu 
50 55 60 

Leu Phe Ala Thr Pro Gin Ala Pro Arg Pro Thr Pro Ser Ala Pro Arg 
65 70 75 80 
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Pro Ala Leu Gly Arg Pro Pro Val Lys Arg Arg Leu Asp Leu Glu Thr 

85 90 95 

A9p His Gin Tyr Leu Ala Glu Ser Ser Gly Pro Ala Arg Gly Arg Gly 
100 105 110 

Arg His Pro Gly Lys Gly Val Lys Ser Pro Gly Glu Lys Ser Arg Tyr 
115 120 125 

Glu Thr Ser Leu Asn Leu Thr Thr Lys Arg Phe Leu Glu Leu Leu Ser 
130 135 140 

His Ser Ala Asp Gly Val Val A9p Leu Asn Trp Ala Ala Glu Val Leu 
145 150 155 160 

Lys Val Gin Lys Arg Arg lie. Tyr Asp lie Thr Asn Val Leu Glu Gly 
165 170 175 

He Gin Leu He Ala Lys Lys Ser Lys Asn His He Gin Trp Leu Gly 
180 185 190 

Ser His Thr Thr Val Gly Val Gly Gly Arg Leu Glu Gly Leu Thr Gin 
195 200 205 

Asp Leu Arg Gin Leu Gin Glu Ser Glu Gin Gin Leu Asp His Leu Met 
210 215 220 

Asn He Cys Thr Thr Gin Leu Arg Leu Leu Ser Glu Asp Thr Asp Ser 
225 230 235 240 

Gin Arg Leu Ala Tyr Val Thr Cys Gin Asp Leu Arg Ser lie Ala Asp 
245 250 255 

Pro Ala Glu Gin Met Val Met Val He Lys Ala Pro Pro Glu Thr Gin 
260 265 270 

Leu Gin Ala Val Asp Ser Ser Glu Asn Phe Gin He Ser Leu Lys Ser 
275 280 285 

Lys Gin Gly Pro He Asp Val Phe Leu Cys Pro Glu Glu Thr Val Gly 
290 295 300 

Gly He Ser Pro Gly Lys Thr Pro Ser Gin Glu Val Thr Ser Glu Glu 
305 310 315 320 

Glu Asn Arg Ala Thr Asp Ser Ala Thr He Val Ser Pro Pro Pro Ser 
325 330 335 

Ser Pro Pro Ser Ser Leu Thr Thr Asp Pro Ser Gin Ser Leu Leu Ser 
340 345 .350 

Leu Glu Gin Glu Pro Leu Leu Ser Arg Met Gly Ser Leu Arg Ala Pro 
355 360 ' 365 

Val Asp Glu Asp Arg Leu Ser Pro Leu Val Ala Ala Asp Ser Leu Leu 
370 375 380 

Glu His Val Arg Glu Asp Phe Ser Gly Leu Leu Pro Glu Glu Phe lie 
385 390 395 400 

Ser Leu Ser Pro Pro His Glu Ala Leu Asp Tyr His Phe Gly Leu Glu 
405 410 415 

Glu Gly Glu Gly He Arg Asp Leu Phe Asp Cys Asp Phe Gly Asp Leu 
420 425 " 430 
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Thr Pro Leu Asp Phe 
435 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "PRIMER" 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO:14: 
GCTCTAGAGC CCAGTATAGA 20 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid. 

(A) DESCRIPTION: /desc » "PRIMER" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GCTCTAGATG TCTCAAGCCT TTCCC 25 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Asp Glu Glu Asp Glu Glu Glu Asp Pro Ser Ser Pro Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Val Ala Leu Ala Thr Gly Gin Leu Pro Ala Ser Asn Ser His Gin 
1 5 10 15 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION; /desc =* "PRIMER" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CACCCGCAAT GGTCACT 17 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "PRIMER" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ATGTCTCAAG CCTTTCCC 18 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear :. 

(ii) MOLECULE TYPE : other nucleic acid 
(A) DESCRIPTION: /desc = "PRIMER" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
GATAGAAAAC GAGCTAGAG 
(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "PRIMER" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
TTCTGAGAAA TCAGAGTCTA 



19 



20 
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CLAIMS 

1. An assay for a putative regulator of cell cycle 
progression which comprises: 

a. expressing in, a cell a protein comprising (i) the 
E region and sufficient C-terrninal residues 
thereof of a DP- 3 protein to provide a functional 
nuclear localisation signal (NLS) and (ii) a 
marker for nuclear localization; and 

b. determining the degree of nuclear localization in 
the presence and absence of said putative 
regulator . 

2. An assay according to claim 1 wherein the NLS comprises 
the sequence: 

SDRKRAREFIDSDFSE (SEQ ID NO. 9) 

3 . An assay according to claim 1 or 2 wherein the number 
of C-terminal residues is from 8 to 20. 

4 . An assay for a putative regulator of cell cycle 
progression which comprises: 

a. expressing in a cell a protein comprising (i) the 
nuclear localisation signal of E2F-1 and (ii) a 
marker for nuclear localization; and 

b. determining the degree of nuclear localization in 
the presence and absence of said putative 
regulator. 

5. An assay according to any one of claims 1 to 4 wherein 
the cell is a yeast, insect or mammalian cell. 

6. An assay according to claim 5 wherein the mammalian 
cell is a primate cell. 

7. An assay according to any one of claims 1 to 6 wherein 
the marker comprises an antigenic determinant bindable by an 
antibody. 
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8 . An assay according to any one of claims 1 to 6 wherein 
the marker comprises an enzyme capable of causing a colour 
change to a substrate . 

9. An assay according -to any one of claims 1, to 6 wherein 
the marker comprises a lucif erase enzyme. 

10. An assay according to any one of claims 1 to 6 wherein 
the marker comprises., a transcription factor or subunit 
thereof,, which transcription factor is capable of activating 
an indicator gene . 

11. An assay according to claim 10 wherein said marker 
comprises the DNA binding domain (DBD) or the 
transcriptional activation domain (TAD) of the yeast 
transcription factor GAL 4, and the indicator gene comprises 
a GAL 4 promoter. 

12. An assay according to claim 11 wherein the indicator 
gene is chloramphenicol acetyl transferase (CAT) or a 
lucif erase . 

13 . An assay according to any one of the preceding claims 
wherein the regulator is a peptide comprising all or part of 
a sequence which is from 60 to 100% homologous (identical) 
to a portion of the DP-3 E region of the same length. 

14. An assay according to any one of the preceding claims 
wherein the expression of the protein is a transient 
expression . 

15. An assay according to any one of claims 1 to 13 wherein 
the cell is stably transfected with a construct expressing 
the protein. 

16 . A method of directing expression of a protein in a cell 
to the nucleus which comprises modifying said protein such 
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that it comprises an E region of a DP- 3 protein or the 
nuclear localisation signal of E2F-1. 

17. A method according to claim 16 wherein said protein is 
a DP-protein which does- not normally comprise an E region. 

18. A protein which does not normally comprise the E region 
of a DP- 3 whose sequence has been modified to comprise said 
E region. 

19. An assay for a putative regulator of cell cycle 
progression which comprises : 

a. expressing in a cell (i) an E- DP transcription 
factor or a portion thereof sufficient to form a 
hetrodimer with an E2F transcription factor and 

(ii) an E2F transcription factor or portion 
thereof sufficient to form a heterodimer with the 
DP transcirption factor or portion thereof and 
direct localisation of said heterodimer to the 
nucleus; and 

b. determining the degree of nuclear localization in 
the presence and absence of said putative 

* regulator . 

20. An assay according to claim 19 wherein the DP 
transcription factor is DP~1. 

21. Axi assay according to claim 19 or 20 wherein the E2F is 
E2F-1. ' 



BNSDOCID: <WO 974364 7A1_I_> 



PCT/GB97/01324 



1/1 

Figure 1 
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